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COMPOSITE BINDING POLYPEPTIDES 

TECHNICAL FIELD 

5 The present disclosure is in the fields of molecular biology and protein design; in 
particular, the design of sequence-speciJBlc binding proteins for regulation of gene 
expression. 

10 BACKGROUND 

Protein-nucleic acid recognition is a commonplace phenomenon fliat is central to a large 
number of biomolecular control mechanisms that regulate the functioning of eukaryotic 
and prokaryotic cells. For instance, protein-DNA interactions form the basis of the 
15 regulation of gene expression and are thus one of the subjects most widely studied by 
molecular biologists. 

A wealth of biochemical and structural information explams the details of protein-DNA 
recognition in numerous instances, to the extent that general prhiciples of recognition 
20 have emerged. Many DNA-binding proteins contain independently folded domains for 
the recognition of DNA, and these domains in turn belong to a large number of structural 
families, such as the leucine zipper, the ''helix-tum-helix" and zinc j5nger families. 

Despite the great variety of structural domains, the specificity of the interactions observed 
25 to date between protein and DNA most often derives jBrom the complementarity of the 

surfaces of a protein a-helix and the major groove of DNA. See, e.g., Klug, (1993) Gene 

135:83-92. In Hght of the recurring physical interaction of a-helix and major groove, the 
tantalising possibihty arises that the contacts between particular amino acids and DNA 
bases could be described by a simple set of rules; in effect a stereochemical recognition 
30 code which relates protein primary structure to binding-site sequence preference. 
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It is clear, however, that no code will be found which can describe DNA recognition by 
all DNA-binding proteins. The structures of numerous complexes show significant 
differences in the way that the recognition a-helices of DNA-binding proteins from 
different structural families interact with the major groove of DNA, thus precluding 
5 similarities in patterns of recognition. The majority of known DNA-binding motifs are 
not particularly versatile, and any codes which might emerge would likely describe 
binding to a very few related DNA sequences. 



Even within each family of DNA-binding proteins, moreover, it has hitherto speared 
10 that the deciphering of a code would be elusive. Due to the complexity of the protein- 
DNA interaction, there does not appear to be a simple "alphabetic" equivalence between 
the primary structures of protein and nucleic acid which specifies a direct amino acid to 
base relationship. 



15 International patent application WO 96/06166 addresses this issue and provides a 
"syllabic" code that explains protein-DNA interactions for zinc finger nucleic acid 
binding proteins. A syllabic code is a code that relies on more than one feature of the 
binding protein to specify binding to a particular base, the features being combinable in 
the forms of "syllables", or complex instractions, to define each specific contact. Segal, 

20 D. J., Dreier, B., Beerh, R. R. & Barbas, C. F. (1999) Proc. Natl. Acad. Sci. USA 96, 
2758-2763 present a method of constructing zinc fingers polypeptides, based on 16 
individual zinc finger domains which bind sequences of fhe form 5*-GXX-3*, where X is 
any base. Sec also U.S. Patent No. 6,140,081. The latter method has the severe 
limitation that it does not provide instructions permitting the specific targeting of triplets 

25 containing nucleotides other than G in the 5' position of each triplet, which greatly 
restricts the potential target sequences of such generated zinc finger peptides. 

International patent application WO98/53057 addresses the above problems by 
recognizing that zinc fingers can specify overlapping 4 bp subsites, and therefore synergy 
30 between adjacent zinc Gngsr domains is an important consideration in selecting zinc 
finger nucleic acid-binding domains to specifically target any sequence. 
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With the recent completion of the human genome project and the rapidly advancing jSelds 
of transgenic animals and plants, thousands of uncharacterised (and characterised) genes 
have (and will) become valid targets for functional genomics and other such projects. 
Concomitantly, ^designer* zinc finger peptides axe emerging as one of the most universal 
5 and desirable ways of regulating the expression of specific genes within cells. See, for 
example, Choo, Y., Sanchez-Garcia, 1. & Klug, A. (1994) Nature 372: 642-645; Beerli, 
R. R., Dreier, B. & Barbas, C. F. IE (2000) Proc. Natl Acad. Sci. USA 97: 1495-1500; 
Kim, J-S. &Pabo, C. O. (1998) Proc. NatL Acad. Sci. USA 95: 2812-2817; Kang, J. S. & 
Kim, J-S. (2000) J. Biol. Chem. 275: 8742-8748); Zhang et al (2000) /. Biol Chem. 
10 275:33,850-33,860; Liu etal (2001) J. Biol Chem. 276:11,323-11,334; and Ren a/. 
(2002) Genes. DeveLl6:27-32. See also WO 00/41566 and WO 01/19981. Hence, a 
rapid method of creating multi-zinc finger peptides for the up- or down-regulation of any 
specific gene is highly desirable. 

As stated above, synergy between adjacent zinc finger peptides is an important factor in 
15 specific DNA recognition. Moreover, the findings reported in co-owned WO 01/53480, 
which is hereby uicorporated by reference, demonstrate that poly-zinc finger peptides 
constructed firom strings of 2-finger domains can provide greater DNA binding 
specificity. 

20 Traditional strategies of zinc finger mutagenesis and selection, such as phage display, 
particularly if employed for the selection of 2-zinc finger units to target any desnred 
binding site are limited by the size of the library that can be cloned into host/vector 
systems, such as phage. Due to limitations in library size imposed by such constraints, it 
is impossible to include an exhaustive combination of randomisations to cover all 

25 potentially important sequeace-space. Furthermore, for important applications of 

engineered zinc finger peptides, such as for gene therapy or transgenic animal systems, 
engineered zinc finger peptides run the significant risk of eliciting a harmful 
immunological reaction in the host animal. 

30 The human genome sequencing project has also revealed the presence of almost 700 
endogenous zinc finger-containing proteins. Assuming that each of these proteins 
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contains at least 2 finger modules, there are probably at least 2,000 natural zinc finger 
modules in the human genome alone. Similar numbers are expected in other animal and 
plant genomes. 

5 SUMMARY 

The present invention recognises the potential importance of designer zinc finger peptides 
in therapeutic and transgenic applications in animals and plants. Furthermore the present 
invention acknowledges that the safety of such appMcations is of primary importance. 

10 

The present invention provides the isolation of natural zinc finger modules, from 
genomes such as human, mouse, chicken, arabidopsis and other species, and the 
. construction of non-natural combinations of such zinc finger modules, to create multi- 
finger domains, and to provide and determine novel nucleic acid binding specificities. 

1 5 Such a procedure will allow the identification of the novel zinc finger domains that bind 
any desired nucleic acid sequence, particularly sequences of between 6 and 10 
nucleotides long. The first advantage of such technology is that millions of years of 
natural evolution, to create specific nucleotide-binding zinc finger modules, are captured 
to create novel nucleic acid-binding domains. Also, use of poly-zinc finger peptides 

20 constructed from such \mits for targeted gene regulation avoids the potentially harmfiil 
effects of host immime responses. The present invention thus greatly enhances the 
possibilities for the use of ziuc finger transcription factors for in vivo applications, such as 
gene therapy and transgenic animals. 

25 In a first aspect, therefore, there is provided a composite binding polypeptide comprising 
a first natural binding domain derived fi:om first natural binding polypeptide, and a 
second natural binding domain derived firom a second natural binding polypeptide, 
wherein said first and second natural binding polypeptides may be the same or different; 
which polypeptide binds to a target, said target differing Scorn the natural target of the 

30 both the first and the second binding polypeptides. 

Preferably, said first and second natural binding polypeptides are different polypeptides. 



wo 02/099084 



PCT/US02/22272 



5 

Binding polypeptides according to the invention comprise two or more natural binding 
domains, advantageously three or more natural binding domains; advantageously, six or 
more domains are included. These are preferably arranged in a 3x2 conformation, 
5 separated by linker sequences. 

The binding domains are preferably nucleic acid binding domains, and the composite 
polypeptide is preferably a nucleic acid binding polypeptide. Most preferably, the 
composite polypeptide is a zinc finger polypeptide, and the natural binding domains are 
1 0 zinc finger domains. 

Zinc finger binding domains can comprise any type of zinc finger or zinc-coordinated 
stmcture including, but not limited to, Cys2-His2 (SEQ ED N0:1) zinc finger bmding 
domain or Cys3-His (SEQ ID N0:2) zinc finger binding domains. 

15 

In a further aspect, there is provided a library of natural binding domains. The natural 
binding domains are the domains tibat may be assembled into polypeptides according to 
the previous aspect of the invention. Preferably, the library is of natural zinc finger 
nucleic acid binding domains. 

20 

Said zinc finger domains may comprise a linker attached thereto. Any linker amino acid 
sequence known in the art can be used. Advantageously, the linker comprises the amino 
acid sequence TGEKP (SEQ ID N0:3). 

25 In a further aspect, the invention provides a method for selecting a binding polypeptide 
capable of binding to a target site, comprising: 

(a) providing a library of natural binding domains; 

(b) assembling two or more of said domains to form a composite polypeptide; 

(c) screening said composite polypeptide against the target site in order to 
30 determine its abiUty to bind the target site. 

Preferably, the natural binding domains are zinc finger binding domains. 
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Furthermore, the invention provides methods for designing a composite binding 
polypeptide, comprising: 

(a) providing information defining a target site; 
5 (b) selecting, from a database of natural binding domains, a sequence of binding 

domains, separated by linker sequences, which is predicted to bind to the target site; 

(c) displaying the sequence of binding domains and linkers and optionally 
assembling the binding polypeptide from a library of said domains. 

10 In certain embodiments, the binding domains are zinc finger domains. In certain 
embodiments, a binding domain sequence that will bind a particular target site is 
predicted by the application of one or more rules that define target binding interactions 
for the binding domains. In additional embodiments, a nucleotide sequence encoding the 
binding domains is assembled and introduced into a cell such that the composite binding 

1 5 polypeptide is expressed. 

In one embodiment, zinc fiuagers can be considered to bind to a nucleic acid triplet, in 
which case domains can be selected according to one or more of the following rules: 

(a) if the 5' base in the triplet is G, then position +6 in the a-helix is Arg; or 
20 position +6 is Ser or Thr and position ++2 is Asp; 

(b) if the 5 * base in the triplet is A, then position +6 in the a-helix is Ghi and -H-2 
is not Asp; 

(c) if the 5 ' base in the triplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

25 (d) if the 5 ' base in the triplet is C, then position +6 in the a-helix may be any 

amino acid, provided that position -H-2 in the a-helix is not Asp; 

(e) if the central base in the triplet is G, then position +3 in the a-helix is His; 

(f) if the central base in the triplet is A, then position +3 in the a-helix is Asn; 

(g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser 
30 or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 
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(h) if the central base in the triplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if the 3' base in the triplet is G, then position -1 in the a-helix is Arg; 
(j) if the 3 ' base in the triplet is A, then position - 1 in the a-helix is Gin; 

5 (k) if the 3' base in the triplet is T, then position -1 in the a-helix is Asn or Gin; 

(1) if the 3* base in the triplet is C, then position -1 in the a-helix is Asp. 

In a further embodiment, the zinc fingers can be considered to bind to a nucleic acid 
quadruplet and domains can be selected according to one or more of the following rules: 
10 (a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Glu, Asn or 

Val; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-hehx is Ser, Thr, Val 

or Lys; 

15 (d) if base 4 in the quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, 

Ala, Glu or Asn; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-hehx is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
20 Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in ttie a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

25 (k) if base 2 in the quadruplet is T, then position -1 in the a-helix is His or Thr; 

(1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp or His; 

(m) if base 1 in the quadruplet is G, then position +2 is Glu; 

(n) if base 1 in the quadruplet is A, then position +2 Arg or Ghi; 

(o) if base 1 in the quadruplet is C, then position +2 is Asn, Gin, Arg, His or Lys; 
30 (p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 
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In a preferred embodiment, zinc fingers are considered to bind to a nucleic acid 
quadruplet and domains are selected according to one or more of the following rules: 
(a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position -H-2 is Asp; 
5 (b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Gin and ++2 

is not Asp; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

(d) if base 4 in the quadruplet is C, tiien position +6 in the a-helix may be any 
10 amino acid, provided that position -H-2 in the a-helix is not Asp; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

15 (h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 

Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 

(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is Asn or Gin; 
20 (I) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp; 

(m) if base 1 in the quadruplet is G, then position +2 is Asp; 

(n) if base 1 in the quadruplet is A, then position +2 is not Asp; 

(o) if base 1 in the quadruplet is C, then position +2 is not Asp; 

(p) if base 1 ui the quadruplet is T, then position +2 is Ser or Thr. 

25 

Two or more composite polypeptides comprising two or more domains which are 
selected for binding to two or more target sites can be combined to provide a composite 
polypeptide which binds to an aggregate binding site comprising the two or more target 
binding sites. 



30 
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In a still futfher aspect, the invention provides a computer-implemented method for 
designing a zinc finger polypeptide that binds to a target nucleic acid sequence, 
comprising the steps of: 

(a) providing a system comprising at least storage means for storing data relating 
5 to a Ubrary of zinc fingers; storage means for storing a rule table; means for inputting 

target nucleic acid sequence data; processing means for generating a result; and means for 
outputting the result; 

(b) inputing sequence data for a target nucleic acid molecule; 

(c) defining a first target zinc finger binding site in said nucleic acid molecule; 
10 (d) interrogating the zinc finger library and rule table storage means, comparing 

zinc fingers to the target zinc finger binding site according to the rule table and selecting 
zinc finger data identifying a zinc finger capable of binding to said target site; 

(e) defining at least one further target zinc finger binding site and repeating step 
(d); and 

15 (JO outputting the selected zinc finger data. 

Such a method may fiirther comprise sending mstructions to an automated chemical 
synthesis system to assemble a zinc fmger polypeptide as defined by the zinc finger data 
obtained in (f). 

20 

In additional embbdiments, the sequence of one or more oligonucleotides encoding a 
composite binding polypeptide can be determined from the sequence of a composite 
binding polypeptide, and the one or more oligonucleotides can be synthesized by any 
nxunber of well-known methods. 

25 

Preferably, a composite binding polypeptide is tested for binding to a target sequence, and 
data firom said testing is used to select, from a plurality of possibilities, a composite 
binding polypeptide that binds with optimal affinity and specificity to the target site. 

30 Advantageously, two or more zinc finger polypeptides are combined to form a zinc finger 
polypeptide capable of binding to an aggregate binding site comprising two or more 
target sites. 
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The rule table preferably comprises rules as set forth above. 
BRIEF DESCRIPTION OF THE FIGURES 

5 Figure 1 shows a flowchart depicting part of the logic used in the selection of zinc 
fingers from a natural library in accordance with the invention. The logic set forth in 
Figure 1 may be supplemented, for example using Rules relating to zinc finger overlap. 
Functional testing of zinc fingers for binding to the desired binding site may be 
implemented in an automated fashion and integrated with the zinc finger design system. 

10 

Figure 2 is a schematic representation of the hxmian zinc finger mini-library construction 
procedure. Synthetic zinc finger coding ohgonucleotides are assembled into fiilHength 
ds expression constructs by overlap PGR. 

15 Fignre 3 is a schematic representation of the fluorescent ELISA assay used to detect zinc 
finger peptides bound to double stranded DNA target sites. Streptavidin (7), biotinylated 
DNA target (5) linked to biotin (6), 3-finger peptide (4) fiised to HA-tag (3), anti-ELA 
antibody (2) fiised to horseradish peroxidase (HRP, 1). 

20 Figure 4 depicts ELISA scores of 384 library 2 constructs screened against the 5'-GCG- 
TGG-GCG-3' (SEQ ID N0:4) target site. Six constructs showed significant binding, and 
are termed C8, G16, 119, 123, J19 and K19, accordmg to their coordinates on the 384-well 
plate. 

25 Figure 5 depicts ELISA scores of selected library 2 members; BIO, C8, G16, 123, J19, 
and K19, against different DNA target sites. The sequences of the target sites are (firom 
back of graph to fi^ont): 5*-GCG"TGG-GCG-3' (SEQ ID NO:5) ; 5*-CCA-CTC-GGC-3' 
(SEQ IDN0:6); 5'-CCT-AGG-GGG-3XSEQIDNO:7); S'-GGA-TAA-GCG-S' (SEQ 
IDNO:8); 5'-GGG-AGG-CCT-3' (SEQIDNO:9); 5*.GCG-TAA-GGA-3' (SEQ ID 

30 NO:10); 5'-GCG-GGG-GGA-3' (SEQ ID N0:11); and no DNA control (firont row). 
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Figure 6 depicts a schematic representation of the 3-zinc finger library constructed 
according to the procedure described in Example 2. 

DETAILED DESCMPHON 

5 

Definitions 

Unless defined otherwise, all technical and scientific terms used herein have the same 
meamng as commonly understood by one of ordinary skill in the art (e.g., in cell culture, 

10 moleciilar genetics, nucleic acid chemistry, hybridisation techniques and biochemistry). 
The practice of the present invention wiU employ, unless otherwise indicated, 
conventional techniques of chemistry, molecular biology, microbiology, recombinant 
DNA, inamunology, chemical methods, pharmaceutical formulations and deUvery and 
treatment of patients, which are within the capabiUties of a person of ordinary skill in the 

15 art. Such techniques are explained in the literature. See, for example, J. Sambrook, E. F. 
Fritsch, and T. Maniatis, 1989, Molecular Cloning: A Laboratory Manual, Second 
Edition, Books 1-3, Cold Spring Harbor Laboratory Press; Ausubel, F. M. et al (1995 
and periodic supplements; Current Protocols in Molecular Biology, ch. 9, 13, and 16, 
John Wiley & Sons, New York, N.Y.); B. Roe, J. Crabtree, and A. Kahn, 1996, DNA 

20 Isolation and Sequencing: Essential Techniques, John Wiley & Sons; J. M. Polak and 
James O'D. McGee, 1990, In Situ Hybridisation: Principles and Practice; Oxford 
University Press; M. J. Gait (Editor), 1984, Oligonucleotide Synthesis: A Practical 
Approach, IRL Press; and, D. M. J. Lilley and J. E. Dahlberg, 1992, Methods of 
Enzymology: DNA Structure Part A: Synthesis and Physical Analysis of DNA Methods in 

25 Enzymology, Academic Press. Each of these general texts is herein incorporated by 
reference. 

The term "library" is used according to its common usage in the art, to denote a collection 
of different polypeptides or, preferably, a collection of nucleic acids encoding different 
30 polypq)tides. The libraries of natural zinc jBnger peptides referred to herein comprise or 
encode a repertoire of polypeptides of diflEerent sequences, each of which has a preferred 
binding sequence. 
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The tenns "polypeptide", ''peptide" and "protein" are used interchangeably to refer to a 
polymer of amino acid residues, preferably including naturally occurring amino acid 
residues. Artificial amino acid residues are also within the scope of the invention, but the 
5 exclusive use of naturally-occurring amino acids is preferred in order to maintain the 
natural nature of the bindiag domains. There are 20 common amino acids, each specified 
by a different arrangement of three adjacent DNA nucleotides by the genetic code. These 
are the building blocks of proteins. Joined together in a strictly ordered chain by peptide 
bonds, the sequence of amino acids detemiines each polypeptide molecule. The 20 
10 common amino acids are: alanine, arginine, aspartic acid, glutamic acid, glutamine, 
glycine, histidine, isoleucine, leucine, phenylalanine, proline, serine, threonine, 
tryptophan^ tyrosine, vaHne, cysteine, methionine, lysine, and asparagine. Virtually all of 
these amino acids (except glycine) possess an asymmetric carbon atom, and thus are 
pot^itially chiral in nature. 



15 



20 



As used herein, '^nucleic acid" includes both RNA and DNA, and nucleic acids 
constructed from natural nucleic acid bases or synthetic bases, or mixtures thereof. 
Modified nucleic acids such as, for example, PNAs and morpholino nucleic acids, are 
also included in this definition. 



A "gene", as used herein, is the segment of nucleic acid (typically DNA) that is involved 
in producing a polypeptide chain or ribonucleic acid gene product. It includes regions 
preceding and following the coding region (leader and trailer) as well as intervening 
sequences (introns) between individual coding segments (exons). Preferably, "gene" 
25 includes the necessary control sequences for gene e}q)ression, as well as the coding region 
encoding the gene product. 

A "binding polypeptide" is a polypeptide capable of binding to a specific target. 
Although, as is well known, polypeptides are capable of non-specific binding to a wide 
30 range of substrates, it is also known that certam polypeptides, such as antibodies and 
other members of the immunoglobulin superfamily, zinc fingers, leucine zipper 
polypeptides, peptide aptamers and the like can bind specifically to target sites or 
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molecules. Generally, specific binding is preferably achieved with a dissociation constant 
(KS) of lOO^M or lower; preferably lOpM or better; preferably l|xM or better; and ideally 
0.5|iM or better. Binding pol)T)eptides can be nucleic acid binding polypeptides which 
bind to nucleic acid in a target sequence-specific manner, such as zinc fmger 
5 polypeptides. Unless specifically noted, no difference is intended herein between terms 
such as "peptide", "polypeptide" and "protein". 

A "natural binding polypeptide" is a binding polypeptide encoded by the genome of a 
living organism such as, for example, a plant or animal. 

10 

A "composite" polypeptide is a polypeptide that is assembled firom a plurality of 
components. In a preferred embodiment, the invention provides composite binding 
polypeptides that are assembled fi'om a plurality of individual natural binding domains as 
set fortli in detail herein. Typically, such domains are zinc finger nucleic acid binding 
15 domains. 

A "natural binding domain" (or module) is a domain of a naturally occurring polypeptide 
that is capable of specific binding to a target as defined above. The terms "domain" and 
"module", according to their ordinary signification in the art, refer to a discrete 
20 continuous part of the amino acid sequence of a polypeptide that can be equated with a 
particular fimction. Protein domains or modules are largely structurally independent and 
can retain their structure and fimction in different environments. In certain embodiments, 
a natural binding domain or module is a zinc finger that binds a triplet or quadruplet 
nucleotide sequence. 

25 

Preferably, each of the individual natural binding domains that make up a composite 
binding polypeptide contain no changes in sequence, as compared to the natural 
sequence. However, those skilled in the art wiU understand that certain changes including 
conservative amino acid substitutions, as well as additions or deletions, may be made 
30 without altering the fimction of a domain. Moreover, where the changes are consistent 
with sequences common to the species from which the domain is derived, such as for 
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example being present in consensus sequences, they are unlikely to give rise to 
immunological problems. 

Conservative amino acid substitutions may be made, for example according to Table 1. 
5 Amino acids in the same block in the second column and preferably in the same line in 
the third colunm may be substituted for one another: 
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Table 1 



ALIPHATIC 


Non-polar 


GAP 






ILV 




Polar - uncharged 


CSTM 






NQ 




Polar - charged 


DE 






KR 


AROMATIC 




HFWY 



A domain is "derived" from a protein if it is effectively removed from a naturally- 
occurring protein for use in a composite binding polypeptide. Removal may be physical 
5 removal, by cleavage of the protein; more commonly, however, the sequence of the 

domain is determined and the domain is synthesised by protein synthesis techniques to be 
a copy of the naturally-occurring domain. Alternatively, a nucleic acid encoding the 
domam is synthesized and expressed in a celL In vitro synthesised domains, or in vitro 
synthesized polynucleotides encoding naturally-occurring domains, are considered to be 
10 "derived" from the natural protein if they recapitulate the sequence of the naturally- 
occurring domain. 

A **target" is a molecule or part thereof to which a binding polypeptide or a binding 
doamin is capable of specific binding. The "natural target" of a binding polypeptide is 
15 the target to which that polypeptide binds in nature; e.g,, in a Hving cell, hi the case of 
zinc finger polypeptides, for instance, the natural target is the nucleotide sequence to 
which the polypeptide binds in a living cell. Sequences other than the natural target, as 
defined herein, to which a zinc finger polypeptide may bind in vitro are not natural 
targets. 

20 

In the case of nucleic acid binding polypeptides, therefore, the term "target" may be 
substituted or supplemented with "binding site" or "binding sequence." Where binding 
sites are assembled to form larger buiding sites, which are bound by mxdti-domain 
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binding polypeptides, such binding sites are referred to as "aggregate binding sites", 
indicating that they are formed by the juxtaposition of two or more individual binding 
sites. The aggregate binding sites can comprise contiguous individual binding sites, or 
individual binding sites interspersed by one or more intervening nucleotides or sequence 
5 of nucleotides. 

The present invention relates to naturally-occurring zinc fingers and their use as specific 
nucleic acid binding modules in combinations not present in nature. This invention 
provides methods of determining and/or predicting the nucleotide binding specificities of 

10 natural zinc finger modules. Also provided are methods of constructing poly-zmc finger 
peptides containing at least one natural zinc finger module, firom Ubraries of natural zinc 
finger peptides, and methods of screening such peptides to determine their preferred 
nucleotide binding specificity. Moreover, the invention provides for the use of 
combinations of such natiu-al zinc finger modules in poly-zinc finger peptides not present 

15 in nature, to bind any desired nucleotide sequence. 

Poly-zinc finger peptides of this invention may contaiu 2, 3, 4, 5, 6 or more zinc finger 
modules. Natural zinc finger modules of this invention may preferably be linked by 
canonical, flexible or structured linkers, as set out below and in WO 01/53480, the 
disclosure of which is hereby incorporated by reference. More preferably, the linkers are 
20 canonical linkers such as -TGEKP- (SEQ ID N0:3). 

The poly-zinc finger peptides of this invention can be given useful biological fimctions by 
the addition of effector domains, creating chimeric zinc finger peptides. Preferably, such 
chimeric zinc fibager peptides may be used to up- or down-regulate desired genes, in vitro 

25 or in vivo. Preferable effector domains include transcriptional repressor domains, 
transcriptional activator domains, transcriptional insulator domains, chromatin 
remodelling domains, enzymatic domains, and signalling / targeting sequences or 
domains. To cause a desired biological effect composite binding polypeptides can bind to 
one or more suitable nucleotide sequences in vivo or in vitro. Preferred DNA regions 

30 team which to effect the up- or down-regulation of specific genes include promoters, 
enhances or locus control regions (LCRs). Other suitable regions within genomes. 
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which may provide useful targets for composite binding polypeptides include telomeres 
and centromeres. 

The expression of many genes is also achieved by controlling the fate of the associated 
5 RNA transcript. RNA molecules often contain sites for RNA-binding proteins, which 
determine RNA half-hfe. Hence, composite binding polypeptides can also control 
endogenous gene expression by specifically targeting RNA transcripts to either increase 
or decrease their half-life within a cell. 

10 Composite binding polypeptides can also be fused to epitope tags, which can be detected 
by antibodies, and may therefore be used to signal the presence or location of a particular 
nucleotide sequence in a mixed pool of nucleic acids, or immobilised on the surface of a 
chip or other such surface. 

15 Intracellular localization of composite binding polypeptides can be regulated, for 

example, by fusion to a localization domain, for example, a nuclear localization sequence 
or a locahzation domain as disclosed, for example, in PCT/USOl/42377. 

a. Nucleic Acid Binding Polypeptides 

20 

This invention preferably relates to nucleic acid binding polypeptides. Preferably, the 
binding polypeptides of the invention are DNA binding polypq}tides. Particularly 
preferred examples of nucleic acid binding polypeptides are zinc finger peptides. 

25 Zinc finger peptides typically contain strings of small nucleic acid binding domains^ each 
stabilised by the co-ordination of zinc. These individual domains are also referred to as 
"fmgers" and "modules". A zinc finger recognises and binds to a nucleic acid triplet, or 
an overlapping quadruplet, in a DNA target sequence. However, zinc fingers are also 
known to bind RNA and proteins. Clemens, K. R. et aL, (1993) Science 260: 530-533; 

30 Bogenhagen, D.F. (1993) MoL Cell. Biol 13: 5149-5158; Searles, M. A. et al, J. Mol 
Biol 301: 47-60 (2000); Mackay, J. P. & Crossley, M. (1998) Trends Biochem, Sci. 23: 
1-4. 
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Preferably, there are 2 or more zinc fingers, for example 2, 3, 4, 5, 6, or 7 zinc fingers, in 
each zinc finger polypeptide. Advantageously, there are 3 or more zinc fingers in each 
zinc finger polypeptide. 

5 

All of the DNA binding residue positions of zinc finger peptides, as referred to herein, are 
numbered firom the first residue in the a-helix of the finger, ranging fi-om +1 to +9. "-1" 
refers to the residue in the framework structure immediately preceding the a-helix in a 
zinc finger peptide. Residues referred to as are residues present in an adjacent 
1 0 (C-terminal) peptide. Where there is no C-terminal adjacent peptide, "-H-" interactions do 
not operate. 

The a-helix of a zinc finger peptide aligns antiparallel to the target nucleic acid strand, 
such that the primary nucleic acid sequoace is arranged 3* to 5* in order to correspond 

1 5 with the N- terminal to C-terminal sequence of the zinc finger peptide. Since nucleic acid 
sequences are conventionally written 5* to 3*, and amino acid sequences N-tenninus to 
C-terminus, the result is that when a target nucleic acid sequence and a zinc finger 
peptide are aligned according to convention, the primary interaction of the zinc finger 
peptide is with the "minus" strand of the nucleic acid sequence, since it is this strand 

20 which is aligned 3' to 5' . These conventions are followed in the nomenclature used 
herein. It should be noted, however, that in nature certain zinc finger modules, such as 
zinc fijiger 4 of the protein GLI, bind to the "plus" strand of the nucleic acid sequence. 
See Suzuki et al (1994) NucL Acids Rev. 22: 3397-3405; and Pavletich & Pabo, (1993) 
Science 261: 1701-1707. The present invention encompasses incorporation of such zinc 

25 finger peptides into DNA binding molecules. 

Natural Zinc Finger Peptides. 

In certain embodiments, this invention relates to natural zinc finger modules. As used 
30 herein, the term 'natural' with reference to a zinc finger, means that the DNA sequence 
which encodes a particular zinc finger, whether normally expressed in vivo or not, is 
found in nature, i.e. is part of the genome of a cell. A natural human zinc finger is one 
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which is endogenous to the human genome, a natural mouse zinc finger is found in the 
mouse genome, and a natural viral zinc finger is found in a viral genome, etc. Natural 
zinc finger genes which have become integrated into the genome of a heterologous 
species by natural means, e.g., integration of a viral genome into a host genome, are 
5 considered to be endogenous to the host species within the context of this disclosure. A 
zinc finger module constructed or produced in vitro or extracted fi*om an in vivo source is 
considered to be natural if its armno acid sequence matches that of the amino acid 
sequence encoded by its natural gene. The DNA sequence of the natural gene is not the 
defining aspect. Thus, polynucleotides encoding natural zinc finger modules may have a 
10 different sequence firom that of the naturally-occurring sequence encoding the module, 
e.g.^ to adjust codon usage to optimise expression of the module in a particular expression 
system. 



Preferably, sequences of zinc fingers used in the present invention are not mutated fi'om 
15 their natural form. Advantageously, the natural zinc finger polypeptides are expressed in 
nature. 



A natural zinc finger binding motif is a structure well known to those in the art and 
defined in, for example, Miller et al, (1985) EMBO J. 4: 1609-1614; Berg (1988) Proc. 
20 Natl Acad, ScL USA 85: 99-102; Lee et al, (1989) Science 245: 635-637; see also 

Intemational patent applications WO 96/06166 and WO 96/32475, incorporated herein by 
reference. 

In general, a natural zinc finger fi'amework has the structure: 
25 SEQ ID NO: 12 Xq-z C Xi-s C X9.14 H Xj.g Vc 

where X is any amino acid, and the numbers in subscript indicate the possible numbers of 
residues represented by X (Formula A). 

In a preferred aspect of the present invention, natural zinc finger nucleic acid binding 
30 motifs may be represented as motifs having the following primary structure: 

Xo-2 C Xi-5 C X2-7 XXXXXXXH Xa-g ^/c (SEQ ID N0;14) 

(SEQ ID NO: 13) 
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-1 1234567 

where X is any amino acid, and the numbers in subscript indicate the possible nimibers of 
residues represented by X (Formula A'). The numbers -1 through 7 refer to amino acid 
position with respect to the beginning of the alpha-helical region of the zinc finger. 

5 The Cys and His residues, which together co-ordinate the zinc metal atom, are marked in 
bold text and are usually invariant. However, all naturally-occurring zinc finger modules, 
even if they diverge from the above formula, are encompassed within the scope of this 
invention. 

10 Zinc finger modules of formula A' are often arranged in tandem within a natural zinc 
finger polypeptide, such that a zinc finger containing protein may have 2, 3, 4, 5, 6, 7, 8, 
9 or more individual zinc finger motifs. In such a protein, individual zinc fingers are 
joined to each other by a poljrpeptide sequence known as a linker. Generally, such a 
natural linker lacks secondary structure, although the amino acids within the linker may 

15 form local interactions when the protein is bound to its target site. By 'linker sequence' is 
meant an amino acid sequence that links together adjacent zinc finger modules. For 
example, in a natural zinc finger protein, tte linker sequence is the amino acid sequence 
which lies between the last residue of the a-helix in a zinc finger and the first residue of 
the P- sheet in the next zinc finger. The linker sequence therefore joins together two zinc 

20 fingers. For the purposes of the present invention, the last amino acid of the a-helix in a 
zinc finger is considered to be the final zinc coordinating histidme (or cysteine) residue, 
while the first amino acid of the following finger is generally a tyrosine / phenylalanine or 
another hydrophobic residue. Since some natural zinc fingers do not start with a 
hydrophobic residue (see Appendices), the start of a finger is sometimes harder to define 

25 from amino acid sequence (or indeed zinc finger structure), and so some flexibility must 
be allowed in this definition. Accordingly, in a natural 2anc finger protein, threonine is 
often considered to be tiie first residue in the hnker, and proline is the last residue of flie 
linker. Thus, for example, in the natural Zif268 peptide the linker sequence is - 
TG(E/Q)(K/R)P- (SEQ ID NO: 15). Although natural Imkers can vary greatly m terms of 

30 amino acid sequence and length, on the basis of sequence homology, the canonical 
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natural linker sequence is considered to be -TGEKP- (SEQ ID N0:3). Hence, the 
preferred linker sequence to join zinc finger modules of the present invention is 
-TGEKP-. 

5 Additionally, a * leader' peptide may be added to the N-terminal zinc finger of a poly-zinc 
finger peptide to aid its expression, without changing the sequence of the natural zinc 
finger module. Preferably, the leader peptide is MABERP (SEQ ID N0:16) or MAERP 
(SEQ ID NO: 17). 



10 In general, naturally occurring zinc finger modules may be selected fi-om those proteins 
for which the DNA binding specificity is already known. For example, these may be the 
proteins for which a crystal structure has been resolved: namely Zif268 (Ehod-Erickson 
etal. (1996) Structure 4: 1171-1180), GLI (Pavletich & Pabo (1993) Science 261: 
1701-1707), Tramtrack (Fairall et al (1993) Nature 366: 483-487) and YYl (Houbaviy et 

15 al (1996) Proc, Natl Acad. Set USA 93: 13577-13582). Furthermore, the sequence 
specificity of many naturally-occurring zinc fingers and zinc finger proteins are known. 
In addition, this invention further provides for the detennination of the binding specificity 
of natural zinc finger modules for use in the present invention. See *Trediction of 
Binding Specificity,'* infra. 

20 

Poly-Zinc Finger Peptides, 



It is desirable that a 'designer* transcription factor for uses such as gene therapy 
and in transgenic organisms should have the ability to target virtually unique sites within 
any genome. For complex genomes such as in humans, an address of at least 16 bps is 

25 required to specify a potentially unique DNA sequence. Shorter DNA sequences have a 
significant probability of appearing several times in a genome, raising the possibility of 
obtaining undesirable non-specific gene targeting with a designed transcription factor 
targeted to such a shorter sequence. As individual zinc fingers only bind 3 to 4 
nucleotides, it is therefore necessary to construct multi-finger polypeptides to target these 

30 longer sequences. A six-zinc finger peptide (with an 18 bp recognition sequence) could, 
in theory, be used for the specific recognition of a single target site and hence, the 
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specific regulation of a single gene within any genome. In addition, a significant increase 
in binding aflSnity might also be expected, compared to a protein with fewer fingers. In 
simple tenns, if a three-finger peptide (with a 9 bp recognition sequence) binds DNA with 
nanomolar affinity, two tandemly luiked three-finger peptides might be expected to bind 
5 an 18 bp sequence with an affinity of 10"^^-10"^^ M. However, most previous attempts at 
producing high-affinity 6-finger peptides (poly-zinc finger peptides) based on fixsions of 
two 3-finger domains have been unsuccessful in generating much of an improvement in 
affinity over 3-fmger peptides. Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas, C. F. HI 
(1997) Proc, Natl Acad, Set USA 94: 5525-5530; Kim, J-S. & Pabo, C. O. (1998) Proc. 

10 Natl Acad. ScL USA 95: 2812-2817; Kamiuchi, T., Abe, E., Imanishi, M., Kaji, T., 

Nagaoka, M. & Sugiura, Y. (1998) Biochemistry 37: 13827-13834. To optimise both the 
affinity and specificity of 6-finger peptides, a fiision of three 2-finger domains has been 
shown to be advantageous, Moore, M., Klug, A. & Choo, Y. (2001) Proa Natl Acad. 
ScL USA 98: 1437-1441; and WO 01/53480. Therefore, in one embodiment, 2-fmger 

15 units are linked to make poly-zinc finger nucleotide-binding domains. A pool of 4096 

such 2-finger units, that recognise all possible 6 bp sequences (4^=4096), represents an 
archive sufficient to rapidly create universal nucleic acid recognition, by simple Unkage, 
in an "off-the-shelf* manner. See Moore et al, supra and WO 01/53480. 

20 PoIy-2inc finger peptides according to this invention may be constructed 

containing 2, 3, 4, 5, 6 or more zinc finger modules. Such poly-zmc finger peptides may 
contain inter-finger linkers other than the canonical (TGEKP) linker sequence, as 
described, for example, in WO 01/53479; Moore, M., Choo, Y. & Klug, A. (2001) Proc. 
Natl Acad. ScL USA 98: 1432-1436; and Moore, M., Klug, A. & Choo, Y. (2001) Proc. 

25 Natl Acad. ScL USA 98: 1437-1441 . Briefly, linker sequences may be flexible or 
structured but, in general, will not form base-specific interactions with the target 
nucleotide sequence. A *flexible' linker is defined as one which does not form a specific 
secondary structure in solution, whereas a 'structured' linker is defined as one that adopts 
a particular secondary stnicture in solution. Preferably, flexible linkers include the 

30 sequences GGERP (SEQ ID N0:18), GSERP (SEQ ID N0:19), GGGGSERP (SEQ ID 
NO:20), GGGGSGGSERP (SEQ ID NO:21), GGGGSGGSGGSERP (SEQ ID NO:22), 



wo 02/099084 



PCT/US02/22272 



23 

GGGGSGGSGGSGGSGGSERP (SEQ ID NO:23). Preferably, the structured linker 
comprises an amino acid sequence that is not capable of specifically binding nucleic acid. 
More preferably, the structured linker comprises the amino acid sequence of TFIOA 
finger IV. Alternatively, or in addition, the structured linker is derived firom a zinc finger 
5 by mutation of one or more of its base contacting residues to reduce or abolish nucleic 
acid binding activity of the zinc finger. The zinc finger may be finger 2 of wild type 
ZiC68 mutated at positions -1, 2, 3 and/or 6. 

In one embodiment, this invention provides for the construction and screening of poly- 
10 zdnc finger peptides containing at least one natural zinc finger module. 

In another embodiment, this invention provides for the constmction and screening of 
poly-zinc finger peptides containing at least one natural zinc finger module, linked with 
the canonical linker sequence -TGEKP- (SEQ ID N0:3). 

15 

In one embodiment, methods for the construction and use of poly-zinc finger peptide 
comprising natural zinc finger modules are provided. 

In another embodiment, methods for the constmction and use of poly-zinc finger peptide 
20 comprising natural zinc finger modules, linked with the canonical linker sequence 
-TGEKP- (SEQ ID N0:3), are provided. 

In a fiirther embodiment, methods for the construction and use of poly-zinc finger 
peptides comprising at least one natural zinc finger module, containing either flexible or 
25 stmctured linkers (as described above and in WO 01/53480), are provided. 

b. Advantages of Natural Zinc Finger Modules 

Zinc finger modules are compact and stable structures of approximately 30 amino acids, 
30 which contain the fiill information required to bind a nucleic acid triplet or overlapping 
quadruplet. As such, they have proven to be extremely versatile scaffolds for engineering 
novel DNA-binding domains. &e, for example, Rebar, E. J. & Pabo, C. O. (1994) 
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Science 263, 671-673; Jamieson, A. C, Kim, S.-H. & Wells, J. A. (1994) Biochemistry 
33, 5689-5695; Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci, U.S A 91, 11 163- 
11167; Choo, Y.,Sanchez-Garcia, I. &Klug, A. (1994) Native 372, 642-645; Wu,H., 
Yang, W.-P. & Barbas HI, C. F. (1995) Proc. Natl. Acad. Sci. USA 92, 344-348; 
5 Greisman, H. A. & Pabo, C. O. (1997) Science 275, 657-661 ; Isalan, M., Klug, A. & 
Choo, Y. (1998) Biochemistry 37, 12026-12033; Choo, Y. (1998) Nature Stmct. Biol. 5, 
264-265; Segal, D. J., Dreier, B., BeerU, R. R. & Barbas, C. F. (1999) Proc. Natl. Acad. 
ScLUSA 96, 2758-2763; Isalan, M. & Choo, Y. (2000) JMol^iol 295, 471-477; and 
BeerU, R. R., Dreier, B., Barbas, C.F. (2000) Proc Natl Acad Sci U S A 97, 1495-500. 
10 The resulting engineered zinc finger domains have increased our knowledge of sequence- 
specific DNA recognition, as well as provided a wide range of potential tools for 
medicine and biotechnology. 



As a result of these and other studies on zinc finger engineering, it has been recognised 
15 that an individual zinc finger module does not necessarily recognise a simple nucleotide 
triplet, as was first tliought; but instead, can bind to an overlapping quadmplet of double 
stranded DNA. See, for example, Isalan et al. (1997) Proc Natl Acad Sci U S A 94, 5617- 
5621 ; and WO98/53057). In this respect, zinc finger engineering strategies have been 
particularly important for deciphering the mechanism and specificity of these interactions. 

20 

With the recent completion of the human genome project and the rapidly advancing fields 
of transgenic animals and plants, thousands of uncharacterised (and characterised) genes 
have (and will) become valid targets for functional genonaics and other such projects. 
Concomitantly, engineered zinc finger peptides (often as a component of "designer^' 

25 transcription factors) are emerging as one of the most universal and desirable ways of 
regulating the expression of specific genes within cells. 5fee, for example, Choo, Y., 
Sanchez-<Tarcia, I. & Klug, A. (1994) Nature 372: 642-645; Beerli, R. R., Dreier, B. & 
Baibas, C. F. m (2000) Proc. Natl Acad. Set USA 97: 1495-1500; Kim, J-S. & Pabo, C. 
O. (1998) Proc, Natl Acad, Sci. USA 95: 2812-2817; Kang, J. S. & Kim, J-S. (2000) /. 

30 Biol Chem. 275: 8742-8748; Zhang et al (2000) J Biol Chem. 275:33,850-33,860; Liu 
etal (2001) J, Biol Chem, 276:11,323-11,334; React al (2002) Genes, Devell6:T}-32\ 
and WO 00/41566. 
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Notwithstanding the remarkable progress in zinc j5nger engineering, there remain several 
issues that limit the use of engineered zinc fingers for such appUcations. Points of 
particular concern include the potential immunogenicity of non-natural zinc jSngers, and 
5 the 'fine-tuning' of particular aspects of the protein-DNA interactions to obtain optimal 
and specific zinc finger-nucleic acid contacts. 

The present invention overcomes problems such as immunogenicity and optimal binding 

specificity, by exploiting the vast repertoire of naturally occurring zinc fingers to 
1 0 construct targeted zinc finger proteins having novel specificities. 

Immunogenicity 

The main fimction of the immune system is to detect, and render harmless, foreign 
1 5 particles which have invaded the body as a whole, or individual cells or organs. 'Foreign' 
in this context means non-host, i.e. a substance which has originated firom a different 
species, or one which has origiaated as a result of a mutation al event (such as might 
generate a malignant cell). On encountering such an antigenic psirticle, either in solution 
or on the surface of an infected cell, the body's defences rapidly destroy/remove it by 
20 complex pathways which involve the interaction of many members of the immime 

system. For a good overview of immunology see Roitt, Essential Immunology, Blackwell 
Science Ltd. and Roitt, L, Brostoff, J. & Male, D. Immunology, 4* Ed. Mosby. Hence, all 
biological therapeutic agents, such as peptides, nucleic acids, viruses, etc., risk eliciting 
an immune response in the recipient Particularly for cases in which repeated doses of a 
25 thenqpeutic agent are required, this response can be strong and potentially dangerous to 
the host organism. 

The immune system fiinctions flirough either innate or adaptive responses. The innate 
response is usually the body's first internal line of defence. Phagocytic cells recognise 
30 and bind to foreign objects in extracellular environments. Once boimd, the foreign object 
is kitemalised and destroyed. Foreign therapeutic agents such as peptides and nucleic 
acids, which are administered directly to the blood stream of the recipient, risk being 
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detected and possibly destroyed before they even reach their intended target. This 
response is one of primitive non-specific recognition of non-host agents, and does not 
adapt with time or exposure to the antigen. 

5 Foreign therapeutic agents (or infectious agents such as bacteria and viruses), which 
evade the innate immune response and may have been successfully delivered to a 
particular cell have not necessarily avoided the host's immune system. Proteins that are 
expressed in cells are routinely degraded Avithin lysosomes, and short peptide firagments, 
generally of between 6 and 9 amino acids, are transported to the cell surface and 

10 presented to the host's immune system. This is the start of the host's second intemal 
defence mechanism against invasion, the adaptive immune response. The proteins 
responsible for displaying such peptide fragments are known as major-histocompatibility 
complexes (MHC) proteins. Lymphocyte cells, known as T-lymphocytes, dock with the 
MHC proteins and scan the peptide fragments displayed. Contact of a T-lymphocyte with 

15 a fragment specifically recognised as not belonging to the host organism initiates an 
immunological cascade \yhich ultimately results in the host cell being destroyed or 
undergoing apoptosis. This mechanism is one of specific recognition, and once 
.recognised as foreign, the antigen is 'remembered' so that any future invasions by the 
agent are dealt with more and more rapidly. B-cells are another type of lymphocyte that 

20 recogiiise extracellular particles and then produce and release antibodies to help combat 
the agent. 

To avoid potentially damaging the host organism and to ensure the successfiil delivery 
and action of a therapeutic p^tide it is important to make it as much like a host protein as 
25 is reasonably possible. In the case of synthesised therapeutic antibodies for human use, a 
great deal of work has gone in to the 'humanisation' of antibodies produced by other 
animal species (See EP 0239400). In this invention we present a solution for the 
equivalent problem associated with zinc finger therapeutic peptides. 

30 To some extent, prior art zinc finger engineering strategies have attempted to miniinise 
the risk of eliciting nnmune responses by using an engineering scaffold that is compatible 
with (i.e. that originates from) the recipient, and by limiting the sizes of the varied regions 
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within the final product. For example, typical engineered zinc fingers utilize a scaffold 
such as the three-finger DNA-binding domain of Zif268 (containing approximately 100 
amino acid residues). Because the amino acid sequence of Zi£Z68 is completely 
conserved in a variety of species, including mice and humans, the scaffold is not itself 
5 immunogenic in these species. However, in order to engineer new DNA-binding 
domains, stretches of approximately 7 amino acids must be varied witfiin each zinc 
finger. These sequences of 7 amino acids represent modifications in positions -1, 1, 2, 3, 
4, 5, and 6 of the a-helix of each finger. Although these engineered regions are 
considered to be relatively small, they are approximately the length of the pq)tide 
10 firagments displayed on the surface of cells by MHC molecules. Hence, they may provide 
antigenic peptide firagments in several registers of the amino acid sequence, which may 
result in dangerous and/or undesirable immune responses in the host. 

Accordingly, it is not known whether this type of engineering strategy will be entirely 
1 5 sufficient to avoid all potential undesirable effects, or indeed whether it will create the 
most optimal fiamework for all zinc finger-nucleic acid interactions. 

In addition to the zinc fingers themselves, it is also possible that inter-finger linker 
sequences could present potential immunological problems. Fortunately, natural zinc 

20 finger proteins display strong conservation and homology in their linker sequence. A 
vory large number of natural fingers are joined by tiie canonical linker peptide -TGEKP- 
(SEQ ID N0:3), located between the final zinc chelating residue (usually histidine) of the 
first finger, and the first residue of the second finger (usually a large hydrophobic residue 
such as tyrosine or phenylalanine, which begins the p-sheet). Hence, the use of the 

25 canonical Unker sequence -TGEKP- (SEQ ID NO:3), to join natural zinc finger modules 
in a non-natural order, will reduce the possibility of eliciting an undesirable immune 
reaction to a minimum. Furthermore, there are so many natural zinc fingers which are 
already joined by canonical linker sequences, that if deemed necessary, the database of 
natural zinc fingers used for the construction of poly-zinc finger peptides may be 

30 restricted to those already flanked by such linkers. 
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The periodicity of zinc fingers and their amenability to linkage using the TGEKP (SEQ 
ID NO:3) motif is illustrated in Table 2. 

a-HELIX LINKER 
5 ' -1123456 

YA CPVESCDRRFS (SEQ ID N0:24) RSDELTRHIRIH (SEQ ID NO:25) TGEKP 
FQ CRI CMRNFS (SEQ ID NO:26) RSDHLSTHIRTH (SEQ ID NO:27) TGEKP 
FA GDI CGRKFA (SEQ ID NO: 28} RSDERKRHTKIH (SEQ ID NO: 29) TGEKP 

10 Table 2. A functional three-finger DNA-binding domain based on the peptide sequence 
of Zif268. TGEKP linker motifs are underlined. The helical residues of each zinc finger 
are numbered relative to the first helical position, position +1. Conserved Cysteines and 
Histidines forming the classical Cys2His2 zinc finger core are shown in bold. 



15 

Fine-Tuning of Zinc Finger-Nucleic Acid Interactions. 

It has previously been shown that zinc fingers cannot simply be regarded as independent 
nucleic acid-binding modules. Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistn^ 37, 

20 12026-12033; Isalan, M,, Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617- 

5621 . The interactions between ac^acent zinc fingers can be complex and involve overlap 
of binding sites, which means that optimal interfaces are not easily engineered through 
rational design. Combinatorial library selection systems, which if designed correctly 
necessarily result in interface compatibility, can help to engineer better optimisation of 

25 the zinc finger-nucleic acid interface. iSfee, for example, WO98/53057. However, all 
library selection systems suffer from the problem of library size, whereby because of 
physical constraints, it is impossible to include an exhaustive combination of 
randomisations to cover all potentially important sequence-space. For example, to 
optimise the zmc finger-nucleic acid interface, subtle amino acid variations may be 

30 needed, even fi^m positions outside the recognition a-helix. Furthermore, alternative 
approaches to zinc finger engineering, such as 'affinity maturation' through random 
mutation or gene shuffling, which may (to a limited extent) rucrease the coverage of 
sequence space, may also raise the probability of generating undesirable immunological 
problems. Hence, it is possible that the creation of truly optimal zinc finger domains for 
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recognition of specific nucldc acid sequences may be outside the scope of traditional 
engineering strategies. 

In contrast, naturally occurring zinc finger modules have already been *fine-tuned* by 
5 thousands of years of natural selection and are, under normal circumstances, non- 

irmuunogenic in their host organism. The human genome project has revealed that zinc 
finger-containing proteins constitute the second most abundant family of proteins in 
humans, with well over 600 members. Since zinc finger proteins usually contain several 
individual zinc finger modules, the human genome provides a repertoire of thousands of 

10 natural zinc finger modules for the creation of composite binding polypeptides. 

Furthermore, because there are only 64 (=4^) possible 3 bp sequences and 256 (=4"*) 
possible 4 bp sequences, it is likely that a natural zinc finger domain exists which is 
capable of binding to every potential 3- or 4-nucleotide target sequence. Consequently, 
natural zinc fingers are a very useful resource for the production of composite binding 

15 polypeptides comprising zinc fingers. At present, the natural binding site of many natural 
zinc finger modules is not known. Thus, to be useful for the construction of composite 
binding polypeptides, nucleotide sequence preferences for certain natural zinc fingers are 
determined according to rules tables disclosed in the following section CTBinding 
Specificity of Natural Zinc Finger Modules"), 

20 

To create optimal poly-zinc finger pq)tides the potentially significant problem of 
interface incompatibility must be addressed, since natural zinc finger modules will not 
necessarily be compatible with each other when juxtaposed. Li this respect, a library 
construction and screening system is preferably employed which links natural zinc finger 

25 modules in non-natural combinations, and screens them against possible target sequences 
of greater than 3 or 4 bp in lengtihi (which represents the possible binding site of a single 
zinc fimger module), to determine optimal 2- or 3-finger domains. In this way, the 
cooperative nature of zinc finger binding is taken into account in the design and selection 
of composite binding polypeptides, and in the determination of the sequence specificity of 

30 their binding. In one embodiment, a library of poly-zinc jSnger peptides containing at 
least one natural zinc finger module is provided. Preferably, poly-zinc finger peptides of 
the library contain at least two natural zinc finger modules. 
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5 c* Binding Specificity of Natural Zinc Finger Modules 

Disclosed herein are certain improvements to cim ent limitations on the use of customised 
zinc finger nucleic acid binding domains, through the use of natural zinc finger modules. 
By using either natural 1 -finger or 2~fmger sub-domains, and/or novel combinatorially- 
10 mixed, pre-selected 2-finger sub-domains, it is possible to construct poly-zinc finger 

peptides that bind any desired nucleotide target sequence, using non-natural combinations 
of natural zinc fingers. 

This approach is particularly suited for human gene therapy applications, but the 
15 invention is not just limited to zinc finger modules encoded by the human genome* For 
applications within transgenic animals such as mice, chicken, etc., the same system can 
be used, but incorporating natural zinc finger modules fi-om those species instead (see 
Example 3). The genome of any organism (eg., animal, plant, bacterium, virus, etc.) can 
thus provide a genetic 'toolbox' of non-immunogenic, structurally optimised zinc fingers 
20 for applications in that organism. 

Before such zinc finger modules can be utilised, however, it is essential that their optimal 
binding site is detennined, in isolation, or preferably as part of a 2- or 3-finger 
subdomain. Natural zinc finger modules are advantageously fused into subdomains 

25 comprising two or three zinc finger modules in random arrangement, optionally 

comprising an anchor finger, then subjected to binding site analysis. An 'anchor* zinc 
finger is one for which the binding specificity is known, such as, for example, finger 1 or 
finger 3 of Zif268, each of which binds ttie sequence 5*-GCG-3*. An anchor finger is 
attached to the N- or C-terminus of the zinc finger module(s) or subdomain for which the 

30 binding specificity is to be determined, and acts as an anchor to set the binding register 
for the binding site selection. For example, if the binding site preference of a pair of 
natural zinc fingers is to be deteraiined, finger 1 of Zif268 may be fused to the N- 
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terminus of the pair of natural fingers, and a 5'-GCG-3' anchor sequence is placed at the 
3' end of 6 or more randomised nucleotides. Selection of the optimal binding site may 
thus be conducted with an oligonucleotide containing the sequence 5'-XXX-XXX-GCG- 
3' (SEQ ID NO:30), where X is any specified nucleotide. The anchor sequence thereby 
5 allows the binding site preference of the zinc finger libraries to be easily determined. 
Such procedures are described in the Examples. 

Screening for Zinc Finger Binding Specificity 

10 There are various approaches, known to those in the art, for screening nucleic acid 

binding peptides for their binding specificity. To determine the binding specificity of, for 
example, zinc finger peptides, procedures can be conducted using: (a) a library of zinc 
fingers and a specified target sequence - to select one or more zinc finger peptides with a 
particular binding preference; or (b) a single zinc finger peptide and a random population 

15 of target sequences - to select one or more optimal binding sites for a particular peptide. 
For many applications, such as for the creation of transcription factors for regulating 
specific gene activity, it is often preferable to screen zinc finger libraries against specific 
target sequences. In this way, the search is geared towards a particular application. 
However, if the fimction or binding specificity of a natural protein is the object of the 

20 investigation, a Ubrary of potential binding sites can be screened useing a single peptide. 
Some such methods are outlined below. 

A typical method for screening libraries of nucleic acid binding polyp^tides against 
specific target sites is that of phage display. Phage display protocols generally involve 

25 expressing the peptides under study as fusions with the glQ major coat protein of 

bacteriophage (J. McCafferty, R. H. Jackson, D. J. Chiswell, (1991) Protein Engineering 
4, 955-961). Suitable protocols for the selection of zinc finger peptides have been 
described and are well known to those in the art. See, for example, Choo, Y. & Klug, A. 
(1994) Proc. Natl. Acad. Sci. U.S.A. 91, 1 1 163-1 1 167; Choo, Y., Sanchez-Garcia, L & 

30 Klug, A. (1994) Nature 372, 642-645; Choo, Y. (1998) Nature Stmct. BioL 5, 264-265; 
Choo, Y. & Klug, A. (1997) Curr.Ooin. Str.Biol. 7, 117-125; 7 Isalan, M., Klug, A. & 
Choo, Y. (1998) Biochemistry 37, 12026-12033; Isalan, M. & Choo, Y. (2000) JMol 
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Biol 295, 47M77; Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617- 
5621; WO 01/53480, WO 01/53479, WO 96/06166, WO 98/53057, WO 98/53058, WO 
98/53059 and WO 98/53060 and references cited therein; see also Examples, infra. In 
general, sequences comprising target sites are bound, such as through biotin-streptavidin, 
5 to a solid support, such as a magnetic particle, or the surface of a tube or well. A solution 
of phage expressing members of a library of zinc finger peptides is then added to the 
immobilised target site. Non-bound phage are washed away and bound phage (containing 
the DNA encoding the bound zinc finger peptide), are collected. The collected phage 
sample is usually reused m fixrther rounds of selection to enrich for the tightest binding 
1 0 zinc finger peptide. 

Phage display protocols based on random mutagenesis of zinc finger modules are known 
to have a number of Umitations. First, as discussed above, the library size that can be 
expressed on the surface of phage is limited by the efficiency of procedures such as 
1 5 cloning and transformation. Furthermore, the efficiency of incorporation of glU-zinc 
finger fusions into phage and hence, zinc finger peptide expression, is determined by the 
number of zinc finger modules. Therefore, 2-finger peptides are expressed more 
efficiently than 3-finger peptides and so on. For this reason, phage display protocols are 
generally limited to the assay of polypeptides comprising 3 or fewer zinc finger modules. 

20 

An alternative to phage display is an in vitro selection system. In such a system, libraries 
of zinc fingers can be produced by PCR using degenerate primer oligonucleotides. 
Target binding sites are added to the end of the DNA encoding the zinc finger pq)tide. 
Zinc finger peptide expression may be performed directly fix>m PGR products using an in 

25 vitro expression kit, such as the TNT T7 Quick Coupled Transcription/Translation 
System for PCR DNA (Promega, Madison, WI, USA), or another suitable expression 
system. The components of tiie expression reaction (including the zinc finger 
gene/binding site) are compartmentalised by suspension in an emulsion, in such a way 
that (on average) only one copy of the zinc finger gene / binding site is present in each 

30 compartment. See, for example, Tawfik, D.S. & Griffiths, A.D. (1998) Nat BiotechnoL 
16: 652-656. Zinc finger peptides which bind the specified target site (and the gene 
encoding them) can be collected using, for example, a suitable epitope tag (such as myc, 
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FLAG or HA tags), and the non-bound binding sites/zinc finger gaies are removed. The 
genes encoding zinc finger peptides that bind the required target site can then be 
amplified by PGR and used in further rounds of selection if required. 

5 A preferred method for selecting a zinc fmger peptide which binds a specified target 
sequence is described in Example 4. Briefly, the DNA encoding a library of zinc finger 
peptides with an attached epitope tag is diluted into as many aliquots as it is possible to 
screen (e.g. 384 or 1534 aliquots). This creates pools of sub-libraries with reduced 
numbers of variants. The DNA is then amplified by PGR and used to produce protein, 

10 from a suitable in vitro expression system, as described above. A specified binding site 
with an attached biotin molecule, and a horse radish peroxidase (HRP)-conjugated 
antibody to the peptide-attached epitope tag may then be added. Binding site / bound 
zinc finger / antibody complexes may be collected by bindmg to streptavidin and ttie 
samples are washed to remove unbound zinc finger and antibodies. The samples 

1 5 containing the highest amoimt of bound zinc finger peptide can be detected by adding an 
HRP substrate solution. The original DNA stock from such positive samples may then be 
diluted into aliquots (as above), PCR-amplified and used for tlie next round of selection. 
In this way, pools of zinc finger encoding genes with the desired activity are isolated, 
subdivided into pools of reduced variation and re-isolated until the most active clone is 

20 identified. 

Principal advantages of the in vitro systems described above are: (a) there is virtually no 
limit to the library size which can be screened (up to 10^^ different PGR products can 
easily be made); and (b) polypeptides comprising larger numbers of linked zinc finger 
25 modules (e.g.^ 4, 5, 6, 7, or more) can be assayed. Another in vitro selection system 
which can be used is polysome/ribosome display. See, for example, Mattheakis, L.G., 
Bhatt, R,R. & Dower, W.J. (1994) Proa Natl. Acad ScL USA, 91: 9022-9026; and WO 
00/27878. 

30 Protocols for the reverse selection procedure, i.e, the selection of a particxilar binding site 
fix)m a mixed population using a single nucleic acid binding polypeptide, include SELEX 
(systematic evolution of ligands by exponential enrichment) and microarray techniques. 
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The SELEX procedure has been well described. See, for example, Drolet, D.W., Jenison, 
R.D., Smith, D.E., Pratt, D. & Hicke, B.J. (1999) Comb. Chem. High Throughput Screen 
2: 271-278; Burden, D.A. & Osheroff, N. (1999) J. Biol. Chem. 274: 5227-5235; 
5 Shultzaberger, R.K. & Schneider, T.D. (1999) Nucleic Acids Res. 27: 882-887; Marozzi, 
A., Meneveri, R., Giacca, M., Gutierrez, M.I., Siccardi, A.G. & Ginelli, E. (1998) J, 
Biotechnol. 15: 117-128; and US Patents No. 5,270,163; 5,475,096; 5,595,877; 
5,670,637; 5,696,249; 5,817,785 and 6,331,398. A single nucleic acid binding 
polypeptide is expressed, either in vitro or in vivo, and screened against a library of target 
10 sequences. Nucleic acid binding polypeptides are collected (along with any bound target 
sites) using an epitope tag (as above) or another suitable procedure. Bound target sites 
are amplified by PGR and may be used in further rounds of selection, to enrich for the 
optimal binding site, or sequenced. 

1 5 Microarray technology provides a method of screening a particular polypeptide or nucleic 
acid against thousands to millions of target sequences on a single sUd support such as, for 
example, a glass or nitrocellulose sUde. For example, the members of a library encoding 
polypeptides comprising 2 linked zinc fingers will bind a 6 bp recognition sequence. 
Hence, there are 4096 (=4^ unique binding sites for such a library. All 4096 of these 

20 sites can be arrayed onto a single glass slide, for example^ allowing a specified 2-finger 
peptide to be screened simultaneously against every possible binding site. The amount of 
binding to each target sequence can be visualised and quantified using simple 
fluorescence measurements. For example, the zinc finger peptide may be expressed in 
vitro, or on the surface of phage. Isolated zinc finger peptides may contain an epitope tag 

25 for labelling purposes, whereas bound phage can be detected using a primary antibody 
against a phage coat protein, such as gVin. A secondary antibody conjugated to, for 
example, R-phycoeryfhrin, horseradish peroxidase or alkaline phosphatase, can be used to 
provide a visible, quantifiable signal when a suitable substrate is applied. See, for 
example, Bulyk ei al (2001) Proc. Natl Acad. Sd. USA:98,:13, 7158-7163, which is 

30 incorporated, by reference, in its entirety. 
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Prediction of Binding Specificity 

The screening approaches described above rely on the assay of large libraries of 
randomly-selected natural zinc finger modules, to obtain one or more zinc £biiger modules 
5 that optimally bind a particular target nucleic acid sequence. In order to simplify the 
process further and ensure a more rapid selection of optimal zinc finger modules for a 
particular target site, sub-libraries can be created. In fliis disclosure, the tenn *sub- 
library' refers to a library of natural zinc finger modules that have been roughly 
categorised according to their predicted binding specificity. For example, the total 

10 population of natural zinc fingers can be sub-divided to create libraries comprising zinc 
finger modules whose predicted binding sites are guanine (G) rich, cytosine (C) rich, 
adenine (A) rich or thymine (T) rich. Alternatively, sub-libraries can be categorised as 
binding G in the 3' position, in the central position, or in the 5' position of a nucleotide 
triplet, etc. Altematively, sub-libraries can be created which comprise zinc finger 

15 modules predicted to bind a particular triplet sequence such as, for example, GGG, GGA, 
GGC, GGT, GAG, GCG, GTG, etc. This approach combines knowledge of the modes of 
zinc finger-nucleic acid recognition, gained fi:om studies on artificial zinc finger variants, 
with the benefits of combinatorial library selection. It also takes into account the fact that 
concerted interactions between adjacent zinc fingers, i.e. overlapping contacts, can affect 

20 the binding affinity and/or specificity of individual zinc fingers. See^ for example, 
Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistrv 37, 12026-12033; Isalan, M., 
Choo, Y. &Klug, A. (1997) ProcNati AcadSci 94, 5617-5621. Thus, for example, a 
composite binding polypeptide comprising two fingCTS, each having a predicted binding 
specificity for a particular triplet, can be easily screened to determine if that pair of 

25 fingers are compatible with each other for bindmg to the 6-nucleotide target site 

comprising their individual target sequences. This strategy is described further in the 
Examples. 

For the process of creating sub-libraries of natural zinc fingers according to predicted 
30 binding preference, the rules set forth in international patent applications WO 96/06166, 
WO 98/53057, WO 98/53058, WO 98/53059 and WO 98/53060, and described in more 
detail below, are used. These rules allow the assignment of an amino acid residue, in an 



wo 02/099084 



PCT/US02/22272 



36 

appropriate position of the recognition region of a zinc finger module (generally 
comprising amino acids -1 through +6, with respect to the start of the alpha-helical 
portion of the finger), which will bind a specified nucleotide in a triplet or quadruplet 
target subsite. However, these rules can also be used to predict the sequence of a target 
5 subsite that would be preferentially bound by a zinc finger of given amino acid sequence. 
In particular, the identity of the amino acid residing at a particular position in the 
recognition region of a natural zinc finger module can be used to predict the identity of a 
nucleotide at a particular location in a target subsite. These 'rules' should be considered 
as a guide to target site preference and not a guaranteed prediction, as binding site 
10 specificity may be determined by variations elsewhere in the zinc finger module (i.e. 

outside of the recognition region), may be influenced by context, or may be influenced by 
factors as yet unknown. It should also be noted that some rules may be more generally 
applicable than others. 

15 In the application of these rules, it should be noted that the recognition region of a zinc 
finger aligns such that the N-terminal to C-terminal sequence of the finger is arranged 
along the nucleic acid strand to which it binds in a 3'-to-5* direction. As a result, when a 
zinc finger sequence and a nucleic acid sequence (to which the finger binds) are aligned, 
the primary interactions occur between the zinc finger and the 'minus' strand of the 

20 nucleic acid sequence (i.e. the strand which has a 3*-to-5' orientation). Furthermore, as 
stated above, the recognition region of a zinc finger comprises amino acids -1 through 
+6, with respect to the start of the alpha-helical portion of the finger. With respect to a 
particular zinc finger, an amino acid residue designated ++2 refers to the residue present 
in the adjacent (in the C-terminal direction) zinc finger, which (in certain instances) 

25 buttresses an amino acid-nucleotide interaction and/or participates in a cross-strand 
interaction with a nucleotide. 

Thus, the following set of rules can be used to predict a 3 bp target subsite for a given 
natural zinc finger module: (a) if the 5' base in the triplet is G, then position +6 in the a- 
30 helix is Arg; or position +6 is Ser or Thr and position ++2 is Asp; (b) if the 5* base in the 
triplet is A, then position +6 in the a-helix is Gin and ++2 is not Asp; (c) if the 5' base in 
the triplet is T, then position +6 in the a-helix is Ser or Thr and position ++2 is Asp; (d) if 
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the 5' base in the triplet is C, then position +6 in the a-helix may be any amino acid, 
provided that position -H-2 in the a-hehx is not Asp; (e) if the central base in the triplet is 
G, then position +3 in the a-helix is His; (f) if the central base in the triplet is A, then 
position +3 in the a-helix is Asn; (g) if the central base in the triplet is T, then position +3 
5 in the a-helix is Ala, Ser or Val; provided that if it is Ala, then one of the residues at -1 or 
+6 is a small residue; (h) if the central base in the triplet is C, then position +3 in the a- 
helix is Ser, Asp, GIu, Leu, Thr or Val; (i) if the 3' base in the triplet is G, then position - 
1 in the a-helix is Arg; 0) if the 3' base in the triplet is A, then position -1 in the a-helix 
is Gin; (k) if the 3' base in the triplet is T, then position -1 in the a-helix is Asn or Gin; (1) 
10 if the 3 ' base in the triplet is C, then position -1 in the a-helix is Asp. 

Furthermore, a natural zinc finger module may be capable of binding specifically to a 
four-nucleotide target subsite that overlaps with the target subsite of an adjacent zinc 
finger. In this case a diflFerent set of *rules* can be used to determine predicted binding 
sites for each zmc finger module. Accordingly, m the description below, the overlapping 

15 4 bp binding site is described such that position 4 is the 5' base of a typical triplet binding 
site, position 3 is the central position of a typical triplet, position 2 is the 3' position of a 
typical triplet, and position 1 is the complement of the nucleotide which is contacted by 
the cross strand interaction from the +2 position of the zinc finger module. Position 1 can 
also be considered to be the 5' base of the triplet or quadruplet contacted by an adjacent 

20 (in the N-terminal direction) finger, if present. 

Binding to each base of a quadruplet by an a-helical zinc finger nucleic acid binding 
motif in a natural protein can be predicted with reference to the following rules: (a) if 
base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; (b) if base 4 in 
the quadruplet is A, then position +6 in the a-helix is Glu, Asn or Val; (c) if base 4 in the 
25 quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val or Lys; (d) if base 4 in the 
quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, Ala, Glu or Asn; (e) if 
base 3 in the quadruplet is G, then position +3 in the a-helix is His; (f) if base 3 in the 
quadruplet is A, then position +3 in the a-helix is Asn; (g) if base 3 in the quadruplet is T, 
then position +3 in the a-heHx is Ala, Ser or Val; provided that if it is Ala, then one of the 
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residues at -1 or +6 is a small residue; (h) if base 3 in the quadruplet is C, then position 
4-3 in the a-helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if base 2 in the quadruplet is G, 
then position -1 in tlie a-helix is Arg; (j) if base 2 in the quadruplet is A, then position -1 
in the a-hehx is Gin; (k) if base 2 in the quadruplet is T, then position -1 in the a-helix is 
5 His or Thr; (1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp or 
His; (m) if base 1 in the quadruplet is G, then position +2 is Glu; (n) if base 1 in the 
quadruplet is A, then position +2 Arg or Ghi; (o) if base 1 in the quadruplet is C, then 
position +2 is Asn, Ghi, Arg, His or Lys; (p) if base 1 in the quadruplet is T, then position 
+2 is Ser or Thr. 

1 0 The above rules may be further refined to those described below: (a) if base 4 in the 
quadruplet is G, then position +6 in the a-helix is Arg; or position +6 is Ser or Thr and 
position -H-2 is Asp; (b) if base 4 in the quadruplet is A, then position +6 in the a-heUx is 
Gin and ++2 is not Asp; (c) if base 4 in the quadruplet is T, then position +6 in the a- 
helix is Ser or Thr and position +4-2 is Asp; (d) if base 4 in the quadruplet is C, then 

1 5 position 4-6 in the a-helix may be any amino acid, provided that position 44-2 in the a- 

helix is not Asp; (e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 
(f) if base 3 m the quadruplet is A, then position +3 in the a-helix is Asn; (g) if base 3 in 
the quadruplet is T, then position +3 in the a-helix is Ala, Ser or Val; provided that if it is * 
Ala, then one of the residues at -1 or +6 is a small residue; (h) if base 3 in the quadruplet 

20 is C, then position 4-3 in the a-helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if base 2 in the 
quadruplet is G, then position -1 in the a-helix is Arg; (j) if base 2 in the quadruplet is A, 
then position -1 in the a-helix is Gin; (k) if base 2 in the quadraplet is T, then position -1 
m the a-helix is Asn or Ghi; (1) if base 2 m the quadruplet is C, then position -1 in the a- 
helix is A^; (m) if base 1 in the quadruplet is G, then position 4-2 is Asp; (n) if base 1 m 

25 the quadruplet is A, then position 4-2 is not Asp; (o) if base 1 in the quadruplet is C, then 
position 4-2 is not Asp; (p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 

The rules therefore predict that the presence of an Asp (D) residue at position 4-2 will 
preclude binding to either A or C by an amino acid at position +6 in an adjacent N- 
30 terminal finger. Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 12026-12033; 
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Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617-56212. Therefore, 
natural zinc fingers containing Asp, Glu, Asn or Gin at +6 are likely to be incompatible 
with any C-terminal finger containing an Asp residue at position +2. Although there are 
many siich rules to describe the overlap between adjacent zinc fingers, a certain degree of 
5 degeneracy exists in these rules. Nonetheless, physical selection procedures (e.g.y library 
construction and screening) can be used to extract optimal pairs of fingers for any given 
target subsite interface. 

Not all natural zinc fingers have a DNA-binding function. For example, it is known that 
10 many zinc fingers, such as those fi-om TFIIIA, bind to RNA (Clemens, K.'BLet aL, (1993) 
Science 260: 530-533; Bogenhagen, D.F. (1993) M?/. Cell Biol 13: 5149-5158; Searles, 
M. A. et al, J, Mol Biol 301 : 47-60 (2000)). The rules governing RNA binding by zinc 
fingers are less well understood than those of DNA binding, but some RNA binding zinc 
fingers can be identified on ttie basis of a characteristic sequence motif, Clemens, K. R. 
15 et al, (1993) Science 260: 530-533; Bogenhagen, D.F. (1993) Mol Cell Biol 13: 5149- 
5158; Searles, M. A. et al (2000) J. Mol Biol 301: 47-60. Furthennore, some zinc 
fingers, such as those firom the protein Ikaros, are able to form protein-protein 
interactions. Such zinc fingers often contain large hydrophobic patches. Mackay, J. P. & 
Crossley, M. (1998) Trends Biochem, Sci. 23: 1-4. 

20 

To this end, applied bioinformatic processing can help to determine which candidates in a 
particular genome are best suited to fulfilling a particular fimction, such as DNA-binding. 
In the case of zinc fingers, numerous documented databases exist denoting amino acid 
residues that are most likely to be found at particular positions within a DNA-binding 

25 zinc finger. See^ for example, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 
12026-12033; Choo. Y. & Klug, A. (1997) Curr. Qpin. Str. BioL 7. 117-125: WO 
98/53060; WO 98/53059; WO 98/53058. As an example, disclosed herein is a database 
of approximately 200 natural human zinc fingers which have been selected (on the basis 
of coded contacts) as having potentially useful DNA-binding activity (see Example 1). 

30 Also disclosed in Example 1 are the predicted DNA target sequences of these zinc 
fingers, assigned according to the rules set out above. 
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As the human genome contains ahnost 700 zinc finger-containing proteins, there are 
many other candidates that can be included in a more inclusive library of natural zinc 
fingers, A selection of these are disclosed in Example 2. 

5 Similar work can be carried out in other organisms, such as farm (cows, pigs, sheep, 
chickens, etc.), laboratory (monkeys, rats, mice, etc.) and domestic (dogs, cats, etc.) 
animals. In this case, it is necessary to select natural zinc finger modules from the 
respective genomes of such organisms. Examples of zinc finger modules which have 
been selected from mouse, chicken and certain plant genomes, are disclosed in 
10 Examples. 

d. Zinc Finger Chimeric Peptides 

In a preferred embodiment, the composite binding polypeptides described herein 
comprise chimeric nucleic acid binding polypeptides. 

15 A chimeric nucleic acid binding polypeptide, also referred to as a fusion 

polypeptide, comprises a binding domain (comprising a nxunber of nucleic acid binding 
polypeptide modules or fingers) designed to bind specifically to a target nucleotide 
sequCTice, together with one or more fi.u:ther biological effector domains or fimctional 
domains. The terms "biological effector domain" and "functional domain" refer to any 

20 polypeptide (of functional fiiagment thereof) that has a biological function. Included are 
enzymes, receptors, regulatory domains, transcriptional activation or repression domains, 
binding sequmces, dimerisation, trimerisation or multimerisation sequences, sequences 
involved in protein transport, localisation sequences such as subcellular locaUsation 
sequences, nuclear localisation, protein targeting or signal sequences. Furthermore, 

25 biological effector domains may comprise polypeptides involved in chromatin 

remodelling, chromatin condensation or decondensation, DNA replication, transcription, 
translation, protein synthesis, etc. Fragments of such polypeptides comprising the 
relevant activity (i.e., functional fragments) are also included in this definition. Preferred 
biological efifector domains include transcriptional modulation domains such as 
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transcriptional activators and transcriptional repressors, as well as their functional 
firagments. 

The effector domain(s) can be covalently or non-covalently attached to the 
binding domain. 

5 Chimeric nucleic acid bijiding polypeptides preferably comprise transcription 

factor activity, for example, a transcriptional modulation activity such as transcriptional 
activation or transcriptional repression activity. For example, a zinc finger chimeric 
polypeptide may comprise a binding domain designed to bind specifically to a particular 
nucleotide sequence, and one or more further biological effector domains, preferably a 
10 transcriptional activation or repression domain, as described in fLulher detail below. The 
zinc finger chimeric polypeptide may comprise one or more zinc fingers or zinc finger 
binding modules. 

Preferably, in the case of a chimeric polypeptide comprising transcriptional 
modulation activity, a nuclear localisation domain is attached to the DNA binding domain 
15 to direct the chimeric polypeptide to the nucleus. 

Generally, a chimeric nucleic acid binding polypeptide, such as a chimeric zinc 
finger polypeptide, can also include an effector domain to regulate gene expression. The 
effector domain can be directly derived firom a basal or regulated transcription factor such 
as, for example, transactivators, repressors, and proteins tihtat bind to insulator or silencer 

20 sequences. See^ for example, Choo & Klug (1995) Curr. Opiru Biotech. 6: 431-436; 
Choo, Y. & Klug, A. (1997) Curr, Opm. Str. Biol. 7, 1 17-125; Rebar & Pabo (1994) 
Science 263: 671-673; Jamieson et al (1994) Biochem. 33: 5689-5695; Goodrich et al 
(1996) Cell 84: 825-830; Vostcov, A. A. & Quitschke, W. W. (1997) J, Biol Chem. 272: 
33353-33359 and WO 00/41566 and references disclosed therein. Other useful domains 

25 are derived from receptors such as, for example, nuclear hormone receptors (Kumar, R & 
Thompson, E. B. (1999) Steroids 64: 310-319 ), and their co-activators and co-repressors 
(Ugai, H. et al. (1999) J. MoL Med. 77: 481-494). 
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A chimeric nucleic acid binding polypeptide can also include other domains that 
may be advantageous within the context of the control of gene expression. Such domains 
include, but are not limited to, protein-modifying domains such as histone 
acetyltransferases, kinases, methylases and phosphatases, which can silence or activate 
5 genes by modifying DNA structure or the proteins that associate with nucleic acids. See^ 
for example, Wolffe, Science 272: 371-372 (1996); Taunton et al. Science 272: 408-41 1 
(1996); Hassig et aL, Proc. Natl. Acad. Sci USA 95: 3519-3524 (1998); Wang, Trends 
Biochem. Sci. 19: 373-376 (1994); and Schonthal & Semin, Cancer Biol. 6; 239-248 
(1995). Additional useful effector domains include those that modify or rearrange nucleic 

10 acid molecules such as methyltransferases, endonucleases, ligases, recombinases etc. 
S'ee, for example, Wood, Ann, Rev. Biochem. 65: 135-167 (1996); Sadowski, FASEB J. 
7: 760-767 (1993); Cheng, Curr. Opin. Struct. Biol. 5: 4-10 (1995); Wu et al (1995) 
Proc. Natl Acad. Sci. USA 92:344-348; Nahon & Raveh, Nucleic Acids Res 1998 Mar 
l;26(5):1233-9; Smith et al Nucleic Acids Res. 1999 Jan 15;27(2):674-81; and Smith et 

15 al (2000) Nucleic Acids Res. Sept 1; 28(17):3361-9. It will be appreciated that the 

biological effector domain portion of the chimeric polypeptide may itself also con^rise 
such activities, without the need for further additional domains. 

For the purpose of gene activation, zinc fmger domains may be fused to the VP64 
domain. See, for example, Seipel et al, EMBO J. 1 1 : 4961-4968 (1996). Other preferred 

20 . transactivator domains include the herpes simplex vims (HSV) VP 16 domain (Hagmann 
e/ a/. (1997) J. F/ro/. 71: 5952-5962; Sadowski a/. (1988) iVafwre 335:563-564), 
transactivation domain 1 and/or domain 2 of the p65 subunit of nuclear factor-icB (NF- 
kB (Schmitz, M. L. et al (1995) 1 Biol Chem. 270: 15576-15584 ). Other transcription 
factors are reviewed in, for example, Lekstrom-Himes J. & Xanthopoulos K. G. (C/EBP 

25 family) J. Biol Chem. 273: 28545-28548 (1998); Bieker, J. L et al, (globin gene 
transcription factors) Ann. N. Y. Acad, Sci. 850: 64-69 (1998), and Parker, M. G. 
(estrogen receptors) jBiWiem. Soc. Symp, 63: 45-50 (1998). 



Use of a transactivation domain from the estrogen receptor is disclosed in 
Metivier, R., Petit, FG., Valotaire, Y. & Pakdel, F. (2000) Mol Endocrinol. 14: 1849- 
30 1 87 1 . Furthermore, activation domains from the globin transcription factors EKLF 
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(Pandya, K. Donze, D. & Townes T. (2001) J, Biol Chem. 276: 8239-8243) may also be 
used, as well as a transactivation domain from FKLF (Asano, H. Li, XS.& 
Stamatoyamiopoulos, G. (1999) MoL Cell Biol. 19: 3571-3579). C/EPB transactivation 
domains may also be employed in the methods described herein. The C/EBP epsilon 
5 activation domain is disclosed in Verbeek, W., Gombart, AF, Chumakov, AM, Mullet, C, 
Friedman, AD, & Koeffler, HP (1999) Blood 15: 3327-3337. Kowenz-Leutz, E. & Leutz, 
A. (1999) MoL Cell. 4: 735-743 disclose the use of the C/EBP tau activation domain, 
while the C/EBP alpha transactivation domain is disclosed in Tao, H., & Umek, RM. 
(1999) DNA Cell Biol 18: 75-84. 

It is known that zinc finger proteins may be fused to transcriptional repression 
domains such as the Kruppel-associated box (KRAB) domain to form powerful 
repressors. These domains are known to rq)ress expression of a reporter gene even when 
bound to sites a few Idlobase pairs upstream from the promoter of the gene (Margolin et 
al, 1994, Proc. Natl Acad. Set USA 91: 4509-4513). Hence, in certain embodiments, 
the KRAB repressor domain from the human KOX-1 protein is used to repress gene 
activity (Moosmann et al, Biol Chem. 378: 669-677 (1997); Thiesen et al. New 
Biologist 2: 363-374 (1990)). In additional embodiments, larger fragments of the KOX-1 
protein comprising the KRAB domain, up to and including full-length KOX protein, are 
used as transcriptional repression domains. See^ for example, Abrink et al (2001) Proc. 
Natl Acad. ScL USA 98:1422-1426. Other preferred transcriptional repressor domains 
are known in the art and include, for example, the engrailed domain (Han et al.^ EMBO J. 
12: 2723-2733 (1993)), the snag domain (Grimes et al., Mol Cell Biol 16: 6263-6272 
(1996)) and the transcriptional repression domain of v-erbA (e.g., Umov et al (2000) 
EMBO J. 19:4074-4090; Sap et al (1989) Nature 340:242-244 and Ciana et al (1999) 
EMBO J. \1:13KL'-139A). 

Biological effector domains can be covalently or non-covalently linked to a 
binding domain. In one embodiment, a covalent liiiker comprises a flexible amino acid 
sequence; fusion polypeptides according to this embodiment comprise a nucleic acid 
binding domain fused, by an amino acid linker, to a biological effector domain. 
30 Alternatively, a covalent linker may comprise a synthetic, non-amino acid based. 
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chemical linker, for example, polyethylene glycol. Synthetic linkers are commercially 
available, and methods of chemical conjugation are known m the art. Covalent linkers 
may comprise flexible or structured linkers, as described above, 

Non-covalent linkages between a nucleic acid binding domain and an effector 
5 domain can be formed using, for example, leucine zipper/coiled coil domains, or other 
naturally occurring or synthetic dimerisation domains. See e.g., Luscher, B. & Larsson, 
L. G. Oncogene 18:2955-2966 (1999) and Gouldson, P. R. et al, 
Neiiropsychopharmacology 23: S60-S77 (2000), 

The expression of composite binding polypeptides (for example, zinc finger 

10 polypeptides) can be controlled by tissue specific promoter sequences such as, for 

example, the Ick promoter (thymocytes, Gu, H. ei al.^ Science 265: 103-106 (1*994)); the 
human CD2 promoter (T-cells and thymocytes, Zhumabekov, T. et al.^ J. Immunological 
Methods 185: 133-140 (1995)); the alpha A-crystallin promoter (eye lens, Lakso, M. et 
al, Proc. Natl Acad, Set 89: ^IZl-dlZ^ (1992)); the alpha-calcium-cahnodulin- 

15 dependent kinase II promoter (hippocampus and neocortex, Tsien, J. et al^ Cell 87: 1327- 
1338 (1996)); the whey acidic protein promoter (mammary gland, Wagner, K.-U. et al^ 
Nucleic Acids Res. 25: 4323-4330 (1997)); the aP2 enhancer/promoter (adipose tissue. 
Barlow C. et a/.. Nucleic Acids Res, 25: 2543-2545 (1997)); the aquaporin-2 promoter 
(renal collecting duct, Nelson R. et al. Am. J. Physiol 275: C216-C226 (1998)); and the 

20 mouse myogenin promoter (skeletal muscle, Grieshamma:, U. et al. Dev. Biol. 197 : 234- 
247 (1998)). The expression of such polypeptides can also be controlled by inducible 
systons, in particular, controlled by small molecule induction such as the tetracycline- 
controlled systems (tet-on and tet-off)> the RU-486 or tamoxifen hoimone analogue 
sjrstems, or the radiation-inducible early growth response gene-1 (EGRl) promoter. 

25 These promoter constmcts and inducible systems have the benefit of being able to 

provide organ-specific and/or inducible expression of target genes for use in applications 
such as gene therapy and transgenic animals. 
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e. Vectors 

The nucleic acid encoding the nucleic acid binding polypeptide such as a zinc 
jBnger polypeptide can be incorporated into intermediate vectors and transformed into 
prokaryotic or eukaryotic cells for expression or DNA amplification. 

5 As used herein, vector (or plasmid) preferably refers to discrete elements that are 

used to introduce heterologous nucleic acid into cells for either expression or replication 
thereof Tlie term ^^heterologous to the cell" means that the sequence does not naturally 
exist in the genome of the host cell but has been introduced into the cell. The term 
"introduced into" means that a procedure is performed on a cell, tissue, organ or organism 

10 such that the gene encoding the nucleic acid binding polypeptide (for example, a zinc 
finger polypeptide) previously absent from the cell or cells is then present in the cell or 
cells. Altematively, or in addition, the gene may be im'tially present in the cell or cells 
and subsequently altered by introduction of heterologous DNA. A heterologous sequence 
may include a modified sequence introduced at any chromosomal site, or which is not 

1 5 integrated into a chromosome, or which is introduced by homologous recombmation such 
tiiat it is present in the genome in the same position as the native allele. Selection and use 
of such vectors are well within the skill of the person of ordinary skill in the art. Many 
vectors are available, and selection of an appropriate vector will depend on the intended 
use of the vector, i.e. whether it is to be used for DNA amplification or for nucleic acid 

20 expression, the size of the DNA to be inserted into the vector, and the host ceil to be 

transformed with the vector, etc. Another consideration is whether the vector is to remain 
episomal or integrate into the host genome. Suitable vectors may be of bacterial, viral, 
insect or mammalian origin. Intermediate vectors for storage or manipulation of the 
nucleic acid encoding the nucleic acid binding polypeptide, or for e3q)ression and 

25 purification of the polypeptide are typically of prokaryotic origin. Most expression 
vectors are shuttle vectors, i.e. they are capable of replication in at least one class of 
organisms but can be transfected into another class of organisms for expression. For 
example, a vector is cloned in E. coU and then the same vector is transfected into yeast or 
matmnalian cells even though it is not capable of replicating independently of the host 

30 cell chromosome. DNA may also be replicated by insertion into the host genome. The 



wo 02/099084 



PCTAJS02/22272 



46 

nucleic acid binding polypeptides such as zinc finger polypeptides described here are 
preferably inserted into a vector suitable for expression in mammalian cells. 

Prokaryote, yeast and higher eukaryote cells may be used for replicating DNA and 
producing the nucleic acid binding protein. Suitable prokaryotes include eubacteria, such 
5 as Gram-negative or Gram-positive organisms, such as E. coliy e.g. E, coli K-12 strains, 
DH5a and HBlOl, or Bacilli. Furtlier hosts suitable for the vectors include eukaryotic 
microbes such as filamentous fimgi or yeast, e.g. Saccharomyces cerevisiae. Higher 
eulcaryotic cells include insect and vertebrate cells, particularly mammahan cells 
including hiunan cells or nucleated ceUs from other multicellular organisms. In recent 
10 years propagation of vertebrate cells in culture (tissue culture) has become a routine 

procedure. Examples of usefiil. mammalian host cell lines are epithelial or fibroblastic cell 
lines such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells, HeLa cells or 293T 
cells. The host ceDs referred to in this disclosure comprise cells in in vitro culture as well 
as cells that are within a host animal. 

IS Each vector contains various components depending on its fimction (amplification 

of DNA or expression of DNA) and the host cell for which it is compatible. The vector 
components generally include, but are not limited to, one or more of the following: an 
origin of replication, one or more selectable marker genes, a promoter, an enhancer 
element, a transcription termination sequence and a signal sequence. 

20 Both expression and cloning vectors generally contain nucleic acid sequence that 

enable the vector to replicate in one or more selected host cells. Typically in cloning 
vectors, this sequence is one that enables the vector to replicate independently of the host 
chromosomal DNA, and includes origins of replication or autonomously replicating 
sequences. Such sequences are well known for a variety of bacteria, yeast and viruses. 

25 The origin of replication from the plasmtd pBR322 is suitable for most Gram-negative 
bacteria, the 2\i plasmid origin is suitable for yeast, and various viral origins (e.g. SV 40, 
polyoma, adenovirus) axe usefiil for cloning vectors in mammalian cells. Generally, the 
origin of replication component is not needed for mammalian expression vectors unless 
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these are used in mammalian cells competent for high level DNA replication, such as 
COS cells. 

Advantageously, an expression and cloning vector contains a selection gene also 
referred to as selectable marker. This gene encodes a protein necessary for the survival or 
5 growth of transformed host cells grown in a selective culture medium. Host cells not 
transformed with the vector containing the selection gene wiU not survive in the culture 
medium. Typical selection genes encode proteins that confer resistance to antibiotics and 
other toxins, e.g. ampicillra, neomycin, methotrexate or tetracycline, complement 
auxotrophic deficiencies, or supply critical nutrients not available from complex media. 

10 

Since the replication of vectors is conveniently done in E, coli, an E, coli genetic 
marker and an E. coli origin of replication are advantageously included. These can be 
obtained from £. coli plasmids, such as pBR322, Bluescript© vector or a pUC plasmid, 
e.g. pUC18 or pUC19, which contain both E, coli replication origin and E. coli genetic 
15 marker confming resistance to antibiotics, such as ampicillin and tetracycline. Vectors 
such as these are comm^ially available. 

As to a selective gene marker appropriate for yeast, any marker gene can be used 
which facilitates the selection for transformants due to the phenotypic expression of the 
20 maricer gene. Suitable markers for yeast are, for example, those conferring resistance to 
antibiotics G41 8, hygromycin or bleomycin, or provide for prototrophy in an auxotrophic 
yeast mutant, for example tlie URA3, LEU2, LYS2, TRPl, or fflS3 gene. 

Suitable selectable markers for mammalian cells are those that enable the 
25 identification of cells competent to take up nucleic acid, such as dihydrofolate reductase 
(DHFR, methotrexate resistance), thymidine kinase, or genes conferring resistance to 
neomycin, G418 or hygromycin. The mammalian cell transformants are placed under 
selection pressure which only those transformants which have taken up and are 
expressing the marker are uniquely adapted to survive. In the case of a DHFR or 
30 glutamine sjmthase (GS) marker, selection pressure can be imposed by culturing the 

transformants under conditions in which the pressure is progressively increased, thereby 
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leading to amplification (at its chromosomal integration site) of both the selection gene 
and the linked DNA that encodes tlie nucleic acid binding protein. Amplification is the 
process by which genes in greater demand (such as one encoding a protein that is critical 
for growth), together with closely associated genes (such as one encoding a composite 
5 binding polypeptide), are reiterated in tandem within the chromosomes of recombinant 
cells. Increased quantities of desired protein are usually synthesised from this amplified 
DNA. 

Expression and cloning vectors usually contain control sequences that are 
recognised by the host organism and are operably Unked to the nucleic acid encoding a 

10 nucleic acid binding polypeptide. The term "control sequences" is intended to include, at 
a minimum, components whose presence can influence expression, and can also include 
additional components whose presence is advantageous, for example, leader sequences 
and fusion partner sequences. The term "operably linked" means that the components 
described are in a relationship permitting them to fimction in their intended manner. 

1 5 Typical control sequences include promoters, enhancers and other expression regulation 
signals such as terminators. Such a promoter may be inducible or constitutive. A 
regulatory sequence operably linked to a coding sequence is ligated in such a way that . 
expression of the coding sequence is achieved under conditions compatible with the 
control sequences. 

20 The tem promoter is well known in the art and encompasses nucleic acid regions 

ranging in size and complexity from minimal promoters to promoters including upstream 
elements and enhancers. Suitable promoters for use in prokaryotic and eukaryotic cells 
are well known in the art, and described in for example^ Current Protocols in Molecular 
Biology (Ausubel et al.^ eds., 1 994) and Molecular Cloning. A Laboratory Manual 

25 (Sambrook et al., 2"** ed. 1989). 

Promoters suitable for use with prokaryotic hosts include, for example, the 
lactamase and lactose promoter systems, alkaline phosphatase, the tryptophan (Tip) 
promoter system and hybrid promoters such as the tac promoter. Their nucleotide 
sequences have been pubUshed, thereby enabling the skilled worker to ligate them to 



wo 02/099084 



PCT/US02/22272 



49 

DNA encoding a composite binding protein, using linkers or adapters to supply any 
required restriction sites. Promoters for use in bacterial systems will also generally 
contain an adjacent ribosome binding site (e.g., a Shine-Dalgamo sequence) operably 
linked to the DNA encoding tbe composite binding polypeptide. 

5 Preferred expression vectors are bacterial expression vectors, which comprise a 

promoter of a bacteriophage such as phage lambda, SP6, T3 or T7, for example, which is 
capable of fimctioning in bacteria. In one of the most widely used expression systems, 
the nucleic acid encoding the fusion protein can be transcribed jfrom a vector by T7 RNA 
polymerase (Stxxditr et al. Methods inEnzymoL 185: 60-89, 1990). hi.the£. coli 

10 BL21 (DE3) host strain, used in conjxmction with pET vectors, the T7 RNA polymerase is 
produced from the Vlysogen DE3 in the host bacterium, and its expression is under the 
control of the IPTG inducible lac XJV5 promoter. This system has been employed 
successfully for over-production of many proteios. Altematively, the polymerase gene 
may be introduced on a lambda phage by infection with an inf phage such as the CE6 

15 phage, which is conamercially available (Novagen, Madison, WI, USA). Other vectors 
include vectors containing the lambda Pl promoter such as PLEX (Invitrogen, ML), 
vectors containing the trc promoters such as pTrcHisXpressTm (Invitrogen), or pTrc99 
(Pharmacia Biotech, SE), or vectors containing the tac promoter such as pKK223-3 
(Pharmacia Biotech), or PMAL (New England Biolabs, Beverly, MA, USA). A suitable 

20 vector for expression of proteins in mammalian cells is the CMV enhancer-based vector 
such as pEVRF (Matthias, et al, (1989) Nucleic Acids Res, 17, 6418). 

Suitable promoting sequences for use with yeast hosts may be regulated or 
constitutive and are preferably derived from a highly expressed yeast gene, especially a 
Saccharomyces cerevisiaie gene. Thus, the promoter of the TElPl gene, the ADHI or 

25 ADHn gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating 
pheromone genes coding for the a- or a-factor or a promoter derived from a gene 
encoding a glycolytic enzyme such as the promoter of the enolase, glyceraldehyde-3- 
phosphate dehydrogenase (GAP), 3-phosphoglycerate kinase (PGK), hexokinase, 
pyruvate decarboxylase, phosphofiuctokinase, glucose-6-phosphate isomerase, 3- 

30 phosphoglycerate mutase, pyravate kinase, triose phosphate isomerase, phosphoglucose 
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isomerase or glucokinase genes, or a promoter from the TATA binding protein (TBP) 
gene can be used. Furthermore, it is possible to use hybrid promoters comprising 
upstream activation sequences (UAS) of one yeast gene and downstream promoter 
elements including a functional TATA box of another yeast gaie, for example a hybrid 
5 promoter including the UAS(s) of the yeast PH05 gene and downstream promoter 
elements including a functional TATA box of the yeast GAP gene (PH05-GAP hybrid 
promoter). A suitable constitutive PH05 promoter is, for example, a shortened acid 
phosphatase PH05 promoter devoid of the upstream regulatory elements (UAS) such as 
the PH05 (-1 73) promoter element starting at nucleotide -173 and ending at nucleotide -9 
10 of the PH05 gene. 

The promoter is typically selected from promoters which are found m animal 
cells, although prokaryotic promoters and promoters functional in other eukaryotic cells 
can be used. Typically, the promoter is derived from viral or animal gene sequences, may 
be constitutive or inducible, and may be strong or weak. 

1 5 Viral promoters can be derived from viruses such as polyoma virus, adenoviruses, 

adeno-associated viruses, poxviruses fowlpox virus), papilloma viruses (e.g., BPV), 
avian sarcoma virus, cytomegalovirus (CMV), herpesviruses, retroviruses, lentiviruses 
and simian virus 40 (SV40). An example of a relatively weak viral promoter is thymidine 
kinase promoter from herpes simplex virus (HSV-TK). 

20 Mammalian derived promoters can be heterologous to the animal in which 

composite binding polypeptide (such as zinc finger polypeptide) expression is to occur, or 
they can be host sequaices. In some ^plications it is preferable to use a promoter that is 
active in all cell types, however it is often preferable to use promoter sequences that aire 
active in specific cell types only. 

25 The actin promoter and the strong ribosomal protein promoter are examples of 

promoter sequences that are active in all cell types. In contrast, by using promoters that 
are specific for certain cell or tissue types, the gene encoding the nucleic acid binding 
polypeptide can be expressed only in the required cell or tissue types. This may be of 
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extreme importance for applications such as gene therapy, and for the production of 
viable transgenic animals. Such promoters are known in the art and include the Ick 
promoter (thymocytes, Gu, H. et al. Science 265: 103-106 (1994)), the human CD2 
promoter (T-cells and thymocytes, Zhiimabekov, T. et aL, J. Immunological Methods 
5 185: 133-140 (1995)); the alpha A-crystallin promoter (eye lens, Lakso, M. et aL, Proa. 
Natl Acad. Sci, 89: 6232-6236 (1992)), the alpha-calcium-cahnodulin-dependent kinase 
n promoter (hippocampus and neocortex, Tsien, J. et al. Cell 87: 1327-1338 (1996)), the 
whey acidic protein promoter (mammary gland, Wagner, K.-U. et aL, Nucleic Acids Res, 
25: 4323-4330 (1997)), the aP2 enhancer/promoter (adipose tissue. Barlow C. et aL, 
10 Nucleic Acids Res, 25: 2543-2545 (1997)), the aquaporin-2 promoter (renal collecting 
duct. Nelson R. et aL, Am. J. PhysioL 275: C216~C226 (1998)), the mouse myogenin 
promoter (skeletal muscle, Grieshamraer, U. et aL, Dev, BioL 197: 234-247 (1998)), 
retinoblastoma gene promoter (nervous system, Jiang, Z. et al.^ J. BioL Chem. 276: 593- 
600 (2001)). 

15 The expression of nucleic acid binding polypeptides such as zinc fmger 

polypeptides can also be controlled by small molecule induction or other inducible 
systems such as the tetracycline inducible systems (tet-on and tet-ofi), the RU-486 or 
tamoxifen hormone analogue systems, or the radiation-inducible early growth response 
gene-1 (EGRl) promoter, all of which are commercially available. By using such 

20 inducible promoter systems, transgenic luies can be established which carry a zinc finger 
chimeric polypeptide but express it only after addition of an inducer molecule. Thus the 
genes encoding the zinc finger polypeptides or other nucleic acid blading polypeptides 
can be expressed (or not expressed) in response to the small molecule, which can be 
easily administered. These systems may also allow the time and amount of polypeptide 

25 expression to be regulated. 

Expression vectors typically contain expression cassettes that carry all the 
additional elements requhred for efficient expression of the nucleic add in the host cell. 
Additional elements are enhancer sequences, polyadenylation and transcriptional 
termination signals, ribosome binding sites, and translational termination sequences. 
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Transcription of DNA by higher eukaiyotes may be increased by inserting an 
enhancer sequence into the vector. Enhancers are relatively orientation and position 
independent. Many enhancer sequences are known from mammalian genes (e.g. elastase 
and globin). However, typically one will employ an enhancer from a eukaryotic cell 
5 virus. Exaniples include the SV40 enhancer on the late side of the replication origm 
(q)prox. bp 100-270) and the CMV early promoter enhancer. The enhancer may be 
spliced into the vector at a position 5' or 3* to the gene encoding the zinc jBnger 
polypeptide or nucleic acid binding polypeptide, but is preferably located at a site 5' from 
the promoter. 

10 It has also been shown that the expression of a heterologous gene in an animal cell 

may be enhanced by retaining intron sequences (as opposed to using a cDNA clone). For 
example, intron 1 of the human CD2 gene has been shown to enhance the level of 
expression of CD2 in human cells (Festenstein, R. et al. 1996 Science 271: 1 123). 

Advantageously, a eukaryotic expression vector encoding a nucleic acid binding 
protein may comprise a locus control region (LCR). LCRs are capable of directing high- 
level integration site-independent expression of transgenes integrated into host cell 
chromatin. This is particularly important where the gene mcoding the zinc finger 
polypeptide or the nucleic acid binding polypeptide is to be expressed over extended 
periods of time, for appUcations such as transgenic animals and gene therapy, as gene 
silencing of integrated heterologous DNA - especially of viral origin - is known to occur 
(Pahner, T. D. et al, Proc. Natl Acad. Scl USA 88: 1330-1334 (1991); Harpers, K. et al. 
Nature 293: 540-542 (1981); Jahner, D, et al. Nature 298: 623-628 (1992); and Chen, W. 
Y. et al. Proa Natl Acad. Sci USA 94: 5798-5803 (1997)). Typical LCRs are 
exQoaplified by the human P-globin cluster, and the HS-40 regulatory region from the a- 
globin locus. 

Eukaryotic vectors may also contain sequences necessary for the termination of 
transcription and for stabilising flie mRNA transcript. Such sequences are commonly 
available from the 5' and 3' untranslated regions of eukaryotic or viral DNAs, and are 
known in the art. These regions contain nucleotide segments transcribed as 
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polyadenylated fragments in the untranslated portion of the mRNA encoding the relevant 
polypeptide. An appropriate terminator of transcription is fused downstream of the gene 
encoding the selected nucleic acid binding polypeptide such as a 2dnc finger protein. Any 
of a number of known transcriptional terminator, RNA polymerase pause sites and 
5 polyadenylation enhancing sequences can be used at the 3* end of the nucleic acid 

encoding for example a zinc finger polypeptide (see, for example, Richardson, J. P. Crit 
Rev. Biocheim Mol Biol 28:1-30 (1993); Yonaha M. & Proudfoot, N. I EMBO J. 19: 

(2000); Ashfield, R. et aL,EMBOJ. 10: 4197-4207 (1991); Hirose, Y. & 
Manley, I L. Nature 395: 93-96 (1998)). 

10 The nucleic acid binding polypeptides are generally targeted to the cell nucleus so 

that they are able to interact with host ceU DNA and bind to the appropriate DNA target 
in the nucleus and regulate transcription. To effect this, a nuclear localisation sequence 
(NLS) is incorporated in frame with the expressible nucleic acid binding polypeptide 
(e.g., zinc finger polypeptide) gene construct. The NLS can be fused either 5' or 3' to the 

1 5 sequence encoding the binding protein, but preferably it is fused to the C-tenninus of the 
chimeric polypeptide. 

The NLS of the wild-type Sinodan Virus 40 Large T-Antigen (Kalderon et aL 
(1984) CeU 37: 801^813; and Markland et al, (1987) M?/. Cell Biol 7: 4255-4265) is an 
appropriate NLS and provides an effective nuclear localisation mechanism in animals. 
20 However, several alternative NLSs are known in the art and can be used instead of the 
SV40 NLS sequence. These include the NLSs of TGA-IA and TGA-IB. 

Composite binding polypeptides can comprise tag sequences to facilitate studies 
and/or preparation of such molecules. Tag sequences may include ELAG-tags, myc-tags, 
25 6his-tags, hemagglutinin tags or any other suitable tag known in the art, 



Moreover, the nucleic acid binding protein gene according to the invention 
preferably includes a secretion sequence in order to facilitate secretion of the polypeptide 
from bacterial hosts, such that it will be produced as a soluble native peptide rather than 
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in an inclusion body. The peptide may be recovered from the bacterial periplasmic space, 
or the culture medium, as appropriate. 

Construction of vectors employs conventional ligation techniques. Isolated 
S plasmids or DNA fragments are cleaved, tailored, and religated in the form desired to 
generate the plasmids required. If desired, analysis to confirm correct sequences in the 
constructed plasnwds is performed in a known fashion. Suitable methods for constructing 
expression vectors, preparing in vitro transcripts, introducing DNA into host ceUs, and 
performing analyses for assessing nucleic acid binding protein expression and function 

10 are known to those skilled in the art. Gene presence, amplification and / or expression 
may be measured in a sample directly, for example, by conventional Soutiiem blotting, 
Nortliem blotting to quantify the transcription of mRNA, dot blotting (DNA or RNA 
analysis), or in situ hybridisation, using an appropriately labelled probe which may be 
based on a sequence provided herein. Those skilled in the art will readily envisage how 

1 5 these methods may be modified, if desired. 

f. Applications of Composite Binding Polypeptides 

20 Nucleic acid binding proteins according to the invention can be employed in a wide 

variety of applications, including diagnostics and as research tools» and also in therapeutic 
applications and in transgenic organisms. 

In Vitro Applications 

25 

Poly-zinc finger peptides of this invention may be employed as diagnostic tools for 
identifying the presence of nucleic acid molecules in a complex mixture. Nucleic acid 
binding molecules according to the invention can differentiate single base pair changes in 
target nucleic acid molecules. 
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Accordingly, the invention provides methods for determining the presence of a target 
nucleic acid molecule, wherein the target nucleic acid molecule comprises a target 
sequence, comprising the steps of: 

5 a) preparing a nucleic acid binding protein, by a method set forth above, which is specific 
for the target nucleic acid sequence; 

b) exposing a test system to the nucleic acid binding protein under conditions which 
promote binding of the protein to the target sequence, and removing any nucleic acid 
binding protein which remains unbound; 

10 c) testing for the presence of the nucleic acid binding protein in the test system; 

wherein, if the nucleic acid binding protein is detected, the target nucleic acid molecule is 
present and, if the nucleic acid binding protein is not detected, the target nucleic acid 
molecule is not present. In additional embodiments, quantitation of the amount of nucleic 
acid binding protein allows quantitation of the amount of the target nucleic acid molecule 

1 5 present in the test system. 

In a preferred embodiment, the nucleic acid binding molecules of the invention can be 
incorporated into an ELISA assay. For example, phage displaying composite binding 
polypeptides can be used to detect the presence of the target nucleic acid, and visualised 
20 using enzyme-linked anti-phage antibodies. 

Further improvements to the use of phage expressing a composite binding polypeptide for 
diagnosis can be made, for example, by co-expressing a maiker protein fused to the minor 
coat protein (gVUI) of a filamentous bacteriophage. Since detection with an anti-phage 

25 antibody would then be unnecessary, the time and cost of each diagnosis would be further 
reduced. D^ending on the requirements, suitable markers for display might include 
fluorescent proteins (A. B. Cubitt, et al, (1995) Trends Biochem Sci. 20, 448-455; T. T. 
Yang, et al, (1996) Gene 173, 19-23), or an enzyme such as alkaline phosphatase (J. 
McCafferty, R. R Jackson, D. J. Chiswell, (1991) Protein Engineering 4, 955-961). 

30 Labelling different types of diagnostic phage with distinct markers would allow multiplex 
screening of a single nucleic acid sample. Nevertheless, even in the absence of such 
refinements, the basic ELISA technique is reliable, fast, smq)le and particularly 
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inexpensive. Moreover it requires no specialised ^aratus, nor does it employ hazardous 
reagents such as radioactive isotopes, making it amenable to routine use in the clinic. The 
major advantage of the protocol is that it obviates the requirraient for gel electrophoresis, 
and so opens the way to automated nucleic acid diagnosis. 

5 

The invention provides nucleic acid bmding proteins that have exquisite specificity. The 
invention lends itself, therefore, to the design of any molecule of which specific nucleic 
acid binding is required. For example, the proteins according to the invention may be 
employed in the manufacture of chimeric restriction enzymes, in which a nucleic acid 
10 cleaving domain is fused to a nucleic acid binding domain comprising a zinc finger as 
described herein. 

In Vivo Applications 

15 The invention fiurther provides composite binding polypeptides (and nucleic acids 

encoding them) that may be used in transgenic organisms (such as non-human animals), 
as therapeutic agents, and in gene therapy explications. 

A transgenic animal is an animal, preferably a non-himian animal, containing at least one 
foreign gene, called a transgene, in its genetic material. Preferably, the transgene is 
20 contained in the animars germ line such that it can be transmitted to the animars 
offspring. Transgenic animals may carry the transgene in all their cells or may be 
genetically mosaic. 

Constructs useful for creating transgenic animals according to the iavention comprise 
genes encoding nucleic acid bindmg polypeptides, optionally under the control of nucleic 
25 acid sequences directing their expression in cells of a particular lineage. Alternatively, 
nucleic acid binding polypeptide encoding constructs may be under the control of non- 
lineage-specific promoters, and/or inducibly regulated. Typically, DNA fragments on the 
order of 10 kilobases or less are used to construct a transgenic animal (Reeves, 1998, 
New. Anat, 253:19). A transgenic ammal expressing one transgene can be crossed to a 
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second transgenic animal expressing second transgene such that their o&pring will carry 
both transgenes. 

Although the majority of previous studies have involved transgenic mice, other species of 
transgenic animal have also been produced, such as rabbits, sheep, pigs (Hammer et al., 
5 1 985, Nature 315:680-683; Kumar, et aL, U.S. 05922854; Seebach, et aL, U.S. Patent 
No. 6,030,833) and chickens (Salter et al., 1987, Virology 157:236-240). Transgenic 
aoiiTials are currently being developed to serve as bioreactors for the production of useful 
pharmaceutical compounds (Van Brunt, 1988, Bio/Technology 6:1149-1154; Wihnut, et 
aL, 1988, New Scientist (July 7 issue) pp. 56-59). Up-regulation of endogenous or 
10 exogenous genes expressing useful polypeptides, such as therapeutic polypeptides, by 
means of a heterologous nucleic acid binding polypeptide, may be used to produce such 
polypeptides in transgenic animals. Preferably, the polypeptides are secreted into an 
extractable fluid, such as blood or mammary fluid (milk), to enable easy isolation of the 
polypeptide. 

15 

Furthermore, the invention provides the use of polypeptide fusions comprising an 
integrase, such as a viral integrase, and a nucleic acid binding protein according to the 
invention to target nucleic acid sequences in vivo (Bushman, (1994) PNAS (USA) 
91 :9233-9237). In gene therapy appUcations, the method may be appUed to the delivery 
20 of functional genes into defective genes, or the delivery of a heterologous nucleic acid in 
order to disrupt an endogenous gene. Alternatively, genes may be delivered to known, 
repetitive stretches of nucleic acid, such as centromeres, together with an activating 
sequence such as an LCR. This would represent a route to the safe and predictable 
incorporation of nucleic acid into the genome. 

25 

In conventional therapeutic applications, nucleic acid binding proteins according to this 
embodiment may be used to specifically eliminate cells having mutant vital proteins. For 
example, if a mutant ras gene is targeted, cells comprising this mutant gene wiU be 
destroyed because ras is essential to cellular survival. Alternatively, the action of 
30 transcription factors can be modulated, preferably reduced, by adiiiinistering to the cell 
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agents which bind to the binding site specific for the transcription factor. For example, 
tiie activity of HIV tat may be reduced by binding proteins specific for HTV TAR. 

Moreover, binding proteins according to the invention can be coupled to toxic molecules, 
5 such as nucleases, which are capable of causing irreversible nucleic acid damage and cell 
death. Such agents are capable of selectively destroying cells that comprise a mutation in 
their endogenous nucleic acid. 

Nucleic acid binding proteins and derivatives thereof as set forth above may also be 
10 applied to the treatment of infections and the like in the form of orgaxiism-specific 

antibiotic or antiviral drugs. In such applications, the binding proteins can be coupled to 
a nuclease or other nuclear toxin and targeted specifically to the nucleic acids of 
microorganisms. 

1 5 Transgenic animals comprising transgenes, optionally integrated within the genome, and 
expressing heterologous zinc finger and other nucleic acid binding polypeptides from 
transgenes, may be created by a variety of methods. Methods for producing transgenic 
animals are known in the art, and are described by Gordon, J. & Ruddle, F.H. Science 
214: 1244-1246 (1981); Jaenisch, R. Proc. Natl. Acad Sci USA 73: 1260-1264 (1976); 

20 Gossler et al^ (1986) Proc, NatL Acad, ScL USA 83:9065-9069; Hogan et al. 

Manipulating the Mouse Embryo: A Laboratory Manual, (1988); and US. Pat. Nos. 
5,175,384; 5,434,340 and 5,591,669. 

Pharmaceutical Preparations 

25 

The invention likewise relates to pharmaceutical preparations which contain the 
compounds according to the invention or pharmaceutically acc^table salts ih^of as 
active ingredients, and to processes for their preparation. 



30 



The pharmaceutical preparations according to the invention which contain the compound 
according to the invention or pharmaceutically acceptable salts tliereof are those for 
enteral, such as oral, fiirthermore rectal, and parenteral administration to (a) warm- 
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blooded aiiimal(s), the pharmacological active ingredient being present on its own or 
together with a phannaceutically acceptable carrier. The daily dose of the active 
ingredient depends on the age and the individual condition and also on the manner of 
administration. 

5 

The novel pharmaceutical preparations contain, for example, from about 10 % to about 
80% (or any integral percentage therebetween), preferably from about 20 % to about 60 
%, of the active ingredient. Pharmaceutical preparations according to the invention for 
enteral or parenteral administration are, for example, those in unit dose forms, such as 

10 sugar-coated tablets, tablets, capsules or suppositories, and furthermore ampoules. These 
are prepared in a manner known per se, for example by means of conventional mixing, 
granulating, sugar-coating, dissolviag or lyophilising processes. Thus, pharmaceutical 
preparations for oral use can be obtained by combining the active ingredient with solid 
carriers, if desired granulating a mixture obtained, and processing the mixture or granules, 

15 if desired or necessary, after addition of suitable excipients to give tablets or sugar-coated 
tablet cores. 

Suitable carriers are, in particular, fillers, such as sugars, for example lactose, sucrose, 
mannitol or sorbitol, cellulose preparations and/or calcium phosphates, for example 

20 ' tricalcium phosphate or calcium hydrogen phosphate, furthermore binders, such as starch 
paste, using, for example, com, wheat, rice or potato starch, gelatin, tragacanth, 
methylcellulose and/or polyvinylpyrrolidone, if desired, dismtegrants, such as the 
abovementioned starches, furthermore carboxymethyl starch, crosslinked 
polyvinylpyrrolidone, agar, alginic acid or a salt thereof, such as sodium alginate; 

25 auxiliaries are primarily glidants, flow-regulators and lubricants, for example silicic acid, 
talc, stearic acid or salts thereof, such as magnesiimi or calcium stearate, and/or 
polyethylene glycol. Sugar-coated tablet cores are provided with suitable coatings which, 
if desired, are resistant to gastric juice, using, inter alia, concentrated sugar solutions 
which, if desired, contain gum arable, talc, polyvinylpyrrolidone, polyethylene glycol 

30 and/or titanium dioxide, coating solutions in suitable organic solvents or solvent mixtures 
or, for the preparation of gastric juice-resistant coatings, solutions of suitable cellulose 
preparations, such as acetylcellulose phthalate or hydroxypropyhnethylceUulose 
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phthalate. Colorants or pigments, for example to identify or to indicate different doses of 
active ingredient, may be added to the tablets or sugar-coated tablet coatings. 

Other orally utilisable pharmaceutical preparations are hard gelatin capsules, and also soft 
5 closed capsules made of gelatin and a plasticiser, such as glycerol or sorbitol. The hard 
gelatin capsules may contain the active ingredient in the form of granules, for example in 
a mixture with fillers, such as lactose, binders, such as starches, and/or lubricants, such as 
talc or magnesium stearate, and, if desired, stabilisers. In soft capsules, the active 
ingredient is preferably dissolved or suspended in suitable liquids, such as fatty oils, 
1 0 paraffin oil or liquid polyethylene glycols, it also being possible to add stabilisers. 

Suitable rectally utiUsable pharmaceutical preparations are, for example, suppositories, 
which consist of a combination of the active ingredient with a suppository base. Suitable 
suppository bases are, for example, natural or synthetic triglycerides, paraffin 
1 5 hydrocarbons, polyethylene glycols or higher alkanols. Furthermore, gelatin rectal 

capsules which contain a combination of the active ingredient with a base substance may 
also be used. Suitable base substances are, for example, liquid triglycerides, polyethylene 
glycols or paraffin hydrocarbons. 

20 Suitable preparations for parenteral administration are primarily aqueous solutions of an 
active ingredient in water-soluble form, for example a water-soluble salt, and furthermore 
suspensions of the active ingredient, such as appropriate oily injection susfpensions, using 
suitable lipophilic solvents or vehicles, such as fatty oils, for example sesame oil, or 
synthetic fatty add esters, for example ethyl oleate or triglycerides, or aqueous injection 

25 suspensions which contain viscosity-increasing substances, for example sodium 
carboxymethylcellulose, sorbitol and/or dextran, and, if necessary, also stabilisers. 

The dose of the active ingredient depends on the warm-blooded animal species, the age 
and the individual condition and on the manner of administration. For example, an 
30 approximate daily dose of about 10 mg to about 250 mg is to be estimated in the case of 
oral administration for a patient weighing approximately 75 kg . 
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g. Transformation and Transfection 

DNA can be stably incorporated into cells or can be transiently expressed nsing 
methods known in the art and described below. Stably transfected cells can be prepared 
by transfecting cells with an expression vector containing a selectable marker gene, and 
5 growing the transfected cells under conditions selective for cells expressing the marker 
gene. To prepare transient transfectants, cells are transfected with a reporter gene to 
monitor transfection efficiency. 

There are many well-known methods of iatroducing foreign nucleic acids into 
host cells, which include electroporation, calcium phosphate co-precipitation, particle 

10 bombardment, microinjection, naked DNA, liposomes, lipofection, and viral infection etc 
(see, e.g. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor Laboratory Press, and Mountain, A. Trends BiotechnoL 18: 
119-128 (2000) for a review). Any of the above methods can be used, as long as it is 
compatible with the host cell. Linear nucleic acid molecules have been found to be more 

1 S efficiently incorporated into mammalian genomes than circular plasmids. Additionally, 
nucleic acid molecules may be delivered to specific target tissues or to individual cells. 
Viral based gene transfer is often favoured for introducing nucleic acids into manunalian 
cells and specific target tissues, and several viral delivery approaches aire in clinical trials 
for gene therapy supplications. However, non-viral methods are attractive due to their 

20 greater safety for the purpose of gene transfer to humans. 

The preferred methods of particle bombardment use biolistics made from gold (or 
tungsten). Compared with other transfection procedures, particle bombardment requires a 
low amount of nucleic acid and a smaller number of cells, making the procedure 
generally more efficient (Heiser, W. C. Anal Biochem. 217: 185-196 (1994); Klein, T. M. 
25 & Fitzpatrick-McElligott, S. Curr. Opin. BiotechnoL 4: 583-590 (1993)). The procedure 
is particularly suited for organisms that are difficult to transfect, and for introducing DNA 
iato organelles, such as mitochondria and chloroplasts. Although generally used for ex 
vivo applications, the procedure is also suitable for in vivo transfection of skin tissue. 
Suitable methods are known in the art and described, for instance, in US Patent Nos. 
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5,489,520 and 5,550,318. See also, Potrykus (1990) Bio/TechnoL 8: 535-542; and 
Fiimegan et al (1994) Bio/TechnoL 12: 883-888. 

Microinjection is a conamon method of nucleic acid delivery to isolated cells 
(Palmiter, R. D. & Brinster, R. L. Anna, Rev, Genet 20: 465-499 (1986); Wall, R. J. et 
5 aL,J. Cell Biochem. 49: U3A20 (1992); Chm, A. W, et al.. Proa NatL Acad, Sci. USA 
95: 14028-14033 (1998)). DNA is generally injected into cells and the cells may then be 
re-introduced into animals. Procedures for such a technique are described in US Pat. Nos. 
5,175,384 and 5,434,340, and improvements to the technique are described in WO 
00/69257. 

10 Efficient for gene transfer in vivo can be obtained following local injection of 

naked DNA. While expression of injected DNA in skin lasts for only a few days, injected 
DNA in mouse skeletal muscle has been shown to last for up to nine months (Wolff, J. A. 
et aL, Hum. Mol Genet: 1: 363-369 (1 992)). Naked DNA is particularly suited to gene 
therapy for preventive and therapeutic vaccines. 

1 5 Cationic liposomes containing cholesterol are particularly suited for delivery of 

nucleic acids to humans as they are biodegradable and stable in the bloodstream. 
Liposomes can be injected intravenously, subcutaneously or inhaled as an aerosol. 
Striblinge/a/. (1992)Proc, Natl Acad, Sci. USA 89:11,277-11,281. Liposomes can be 
targeted to certain cell types by incorporating ligands, receptors or antibodies 

20 (immunolipids) into the lipid membrane (US. Pat. No. 4,957,773). On contacting target 
cells, entry of DNA from liposomes is via endocytosis and diffusion. Preparations of 
lipid formulations are commercially available and methods for their use are well 
documented (Bogdanenko, E. V. et al, Vopr. Med. Khim, 46: 226-245 (2000); Natsume, 
A. etal. Gene Ther. 6: 1626-1633 (1999)). 

25 Uptake of DNA into animal cells can also be enhanced by using transfection 

agents. 'Transfecting agent", as utilised herein, means a composition of matter added to 
the genetic material for enhancing the uptake of exogenous DNA segment (s) into a 
eukaryotic cell, preferably a mammalian cell, and more preferably a mammalian germ 
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cell. The enhancement is measured relative to the uptake in the absence of the 
transfecting agent Examples of transfecting agrats include adenovirus-transferrin- 
polylysine-DNA complexes. These complexes generally augment the uptake of DNA into 
the cell and reduce its breakdown during its passage through the cj^oplasm to the nucleus 
5 of the cell. These complexes can be targeted to the male germ cells using specific ligands 
which are recognised by receptors on the cell surface of the germ cell, such as the c-kit 
ligand or modifications thereof. Other preferred transfecting agents include lipofectin™, 
Upofectamine™, DIMRIE C, Superfect, and Effectin (Qiagen), unifectin, maxifectin, 
DOTMA, DOGS (Transfectam; dioctadecylamidoglycylspermine), DOPE (1,2-dioleoyl- 

10 sn-glycero-3 phosphoethanolamine), DOTAP (l,2-dioleoyl-3-trimethylarmnonium 
propane), DDAB (dimethyl dioctadecylammonium bromide), DHDEAB (N, N-di-n- 
hexadecyl-N, N-dihydroxyethyl ammonium bromide), HDEAB (N-n-hexadecylN, N 
dihydroxyethylammonium bromide), polybrene, or poly (ethylenimine) (FBI). For 
example, Banerjee, R. et aL, Novel series of non-glycerol-based cationic transfection 

15 Upids for use in Uposomal gene delivery,,/. Med. Chem, 42 (21): 4292-99 (1999); 
Godbey, W. T. et aL^ inproved packing of poly (ethylenimine)-DNA complexes 
increases transfection efficiency. Gene Ther. 6 (8): 1380-88 (1999); Kichler, A et aL, 
Influence of the DNA complexation medium on the transfection efficiency of 
lipospermine/DNA particles. Gene Ther. 5 (6): 855-60 (1998); Birchaa, J. C. et al., 

20 Physico-chemical characterisation and transfection efficiency of lipid-based gene delivery 
complexes, Int. J. Pharm. 183 (2): 195-207 (1999). These non-viral agents have the 
advantage that they facilitate stable integration of xenogeneic DNA sequences into the 
vertebrate genome, without size restrictions commonly associated with vims-derived 
transfecting agents. 

25 The most critical issues for applications such as gene therapy are the efficient 

deliv^ and appropriate expression of transgenes in host cells. For this purpose, viral 
systems are particularly well suited as viruses have evolved to efficiently cross the plasma 
membrane of eukaryotic cells and express their nucleic acids in host cells. Suitability of 
viral vectors is assessed primarily on their ability to carry foreign nucleic acids and 

30 deliver and express transgenes with high efficiency. Current applications utilise both 
RNA and DNA virus based systems, and 70% of gene therapy trials use viral vectors 
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derived from retroviruses, adenovirus, adeno-associated virus, herpesvirus and pox virus. 
See, for example, Flotte et al (1995) Gene Ther. 2:357-362; Glorioso et aL (1995) Ann, 
Rev. Microbiol 49:675-710; Smith(1995)^n«.J?ev.Mcro6zo/. 49:807-838; Prince 
(1998) Pathology 30:335-347; and Robbins et aL (1998) Pharmacol Ther. 80:35-47. 
5 Retroviruses represent the most prominent gene delivery system as they mediate high 
gene transfer and expression of therapeutic genes. Members of the DNA \irus family 
such as adenovirus, adeno-associated virus or herpesvirus are popular due to their 
efficiency of gene delivery. Adenoviral vectors are particularly suited when transient 
transfection of nucleic acid is preferred. Retroviruses express particular envelope 

10 proteins that bind to specific cell surface receptors on host cells, in order for the virus to 
enter the cell. Hence, the type of viral vector used should be determined by the tissue type 
to be targeted. See e.g., Domburg (1995) Gene Ther, 2:301-310; Gunzburg, et al, (1996) 
J.MdLMed. 74:171-182; Vile et aL {1996) MoL BiotechnoL 5:139-158; Miller (1997) 
"Development and Applications of Retroviral Vectors" Cold Spring Haibor Laboratory 

15 Press, Cold Spring Harbor, New York; Karavanas et aL (1998) Crit, Rev, Oncol 

Hematol 28 :7-30; Hu et al (2000) Pharmacol Rev. 52: 493-5 11; and Walther et al 
(2000) Drugs 60: 249-271 for reviews. 

Safety is a critical issue for viral based gene delivery because most viruses are 
either pathogens or have pathogenic potential. Generally, when a replication-competent 

20 virus infects an animal ceil it can express viral genes and release many new infectious 
viral particles in the host organism. Hence, it is very important that during transgene 
delivery the host animal does not receive a pathogenic virus with full replication 
potential. For this reason, viral-host cell systrais have been developed for gene therapy 
treatments to prevent the creation of replication-competent viruses. In this method, viral 

25 components are divided between a vector and a helper construct to limit the ability of the 
virus to repUcate (Miller 1997). The viral vector contains tiie gene(s) of interest and cis- 
acting elements that allow gene expression and replication, but contain deletions of some 
or all of the viral proteins- Helper cells (or occasionally, helper virus) are engineered to 
express the viral proteins needed to propagate the viral vectors. These new viral particles 

30 are able to irrfect target cells, reverse transcribe the vector RNA and integrate its DNA 
copy into ttie genome of the host, which can then be expressed. However, the vector can 
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not express the viral proteins required to create new infectious particles. Helper cell lines 
are known in the art (see Hn, W-S & Pathak, V. K. Pharmacol. Rev, 52: 493-5 1 1 (2000), 
for a review). 

hi general, retroviral vectors are able to package reasonably long stretches of 
5 foreign DNA (up to 10 kb). Oncoviruses are a type of retrovirus, which only infect 
rapidly dividing cells. For this reason they are especially attractive for cancer therapy. 
Murine leukaemia virus (MLV)-based vectors are the most commonly used of this class. 
Spleen necrosis virus (SNV), Rous sarcoma virus and avian leukosis virus are other types. 
Lentiviral vectors are retroviral vectors that can be propagated to produce high viral titres 

10 and are able to infect non-dividing cells. They are more complex than oncoviruses and 
require regulation of their rephcation cycle. Lentiviral vectors which may be used 
include hiunan immunodeficiency virus (BD[V-1 and -2) and simian immunodeficiency 
virus (SrV) based systems. HTV infects cells of the immune system, most importantly 
CD4*** T-lymphocytes, and so may be usefiil for targeted gene therapy of this cell type. 

15 Another type of retrovnus is the spumavkus. Spumaviruses are attractive because of 
their apparent lack of toxicity. Linial (1999) / Kzro/. 73:1747-1755. 

Adenoviral vectors have high transduction efficiency and are able to transfect a 
number of different cell types, including non-dividing cells. They have a high edacity for 
foreign DNA and can carry up to 30 kb of non-viral DNA (for a review see, Kochanek, S. 

20 Hum, Gene Ther, 10: 2451-2459 (1999)). Recombinant adenoviral (rAd) vectors are 
becoming one of the most powerful gene delivery systems available and have been used 
to deliver DNA to post-mitotic neurons of the central nervous system (CNS) (Geddes, B. 
J. et al. Front Neuroendocrinol. 20: 296-316 (1999), and are used to treat diseases such 
as colon cancer (Alvarez et al^ Hum. Gene Ther. 5: 597-613 (1997). Adeno-associated 

25 virus (AAV) vectors and recombinant AAV (rAAV) vectors are proving themselves to be 
safe and efficacious for the long-term expression of proteins to correct genetic disease. 
Snyder, R. O. J. (Gene. Med. 1: 166-175 (1999)) provides a review of gene delivery 
approaches using such vectors. Construction of such vectors is described in, for example, 
Samulski etal.J, ViroL 63: 3822-3828 (1989), and US. Pat. No. 5,173,414. 
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Many gene therapy trials have been conducted and are underway (over 3,500 
people have been treated with gene flierapy systems), and several reviews can be studied 
for details of the protocols and results (Hwu & Rosenberg, Ann N Y Acad Sci. 1994 May 
31;716:188-97; Blaese, Hosp Pract (Off Ed). 1995 Nov 15;30(ll):33-40; Blaese, Hosp 
5 Pract (Off Ed). 1995 Dec 15;30(12):37-45; Breau & dayman, Curr Opin Oncol. 1996 
May; 8(3):227-3 1 ; Dunbar Annu Rev Med. 1996;47: 1 1-206; Lotze Cancer J Sci Am. 
1996 Mar;2(2):63). The first gene therapy trial was carried out by Blaese et al., (1995), to 
correct a genetic disorder known as adenosine deaminase (ADA) deficiency, which leads 
to severe unmunodeficiency. Several cancer gene therapy strategies are being developed, 

10 which involve eliminating cancer cells by suicide therapy (Oldfield et aL, Hum Gene 
Ther. 1993 Feb;4(l):39-69), modification of cancer cells to promote immune responses 
(Lotze et ai^ Hum Gene Ther. 1994 Jan;5(l):41-55), and reversion by dehvery of a tumor 
suppressor gene (Roth et al. Hum Gene Ther. 1996 May l;7(7):861-74). Another 
successful gene therapy trial has been conducted to combat graft-versus-host disease, 

1 5 which can result following transplant procedures such as bone marrow transplants 

(Bonini et al. Science. 1997 Jun 13;276(5319):1719-24). This procedure was carried out 
using an HSV-based vector. Several gene therapy treatments are under investigation for 
the treatment of HIV-1 infection. Most treatments involve modification of lymphocytes, 
ex vivoy to suppress the expression of vnal genes, by means of ribo2ymes, antisense RNA, 

20 mutant trans-dominant regulatory proteins and modification to elicit a host immune 

response (Nabel et aL, Cardiovasc Res. 1994 Apr;28(4):445-55; Galpin et al.. Hum Gene 
Ther. 1994 Aug;5(8):997-1017; Morgan RA, Walker R, Hunt Gene Ther 1996 Jun 
20;7(10):1281-306 Gene therapy for AIDS using retroviral mediated gene transfer to 
deliver HIV-1 antisense TAR and transdominant Rev protein genes to syngeneic 

25 lymphocytes in HIV-1 infected identical twins; Wong-Staal et al.^ Hum Gene Ther. 1998 
Nov 1 ;9(16):2407-25). Vectors currently in use for gene ther£5>y treatments and animal 
tests include those derived firom Moloney murine leukemia virus> such as MFG and 
derivative thereof and the MSCV rettx)viral expression system (Clontech, Palo Alto, 
California). Many other vectors are also commercially available. 

30 Viral vectors are especially important in applications when a specific tissue type is 

to be targeted, such as for gene therapy applications. There are two available methods for 
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targeting genes to specific cell or tissue type. One strategy is designed to control 
expression of the required gene using a tissue specific promoter (discussed above), and 
another strategy is to control viral entry into cells. Viruses tend to enter specific cell types 
according to the envelope proteins that they express. However, by engineering the 
5 envelope proteins to express specific proteins as fiisions, such as erythropoietin, insulin- 
like growth factor I and single chain variable fi*agment antibodies, viral vectors can be 
targeted to specific cell-types (Kasahara ei aL, Science. 1994 Nov 25;266(5189):1373-6; 
Somia et a/., Proc Natl Acad Sci USA. 1995 Aug l;92(16):7570-4; Jiang et al, J Virol. 
1998 Dec;72(12): 10148-56; Chadwick e/«/., JMol Biol. 1999 Jan 15;285(2):485-94). 

10 In one example of tissue specific targeting in transgenic mice, a novel transgene 

dehvery system has been developed in which the target tissue type expresses an avian 
viral receptor (TV A), under the control of a tissue specific promoter. Transgenic mice 
expressing the TV A receptor are then infected with avian leukosis virus, carr5dng the 
transgene(s) of interest (Fisher, G. H. etal. Oncogene 18: 5253-5260 (1999). 

15 h. Construction of Zinc Finger libraries 

Zinc finger libraries may be constmcted firom naturally-occuiring human zinc 
finger modules. Thus, the invention provides libraries of zinc finger modules. Module 
libraries according to the invention may be assembled combinatorially into zinc fingei* 
polypeptides. The combinatorial assembly may be carried out biologically, using random 

20 assembly and selection technologies, or in a directed maimer under computer control, 

assembling desired modules to produce zinc fingers having defined or random specificity. 
In accordance with the invention, libraries may be constructed entirely firom natural zinc 
finger polypeptide modules firom which zinc finger polypeptides having any desired 
specificity may be isolated. The invention, in its most preferred aspect, does not require 

25 the CTgineering of the specificity of any zinc finger module in order to produce a zinc 
finger polypeptide having specificity for any desired nucleic acid sequence. 

Selection of appropriate zinc finger modules for assembly into libraries of 
composite binding polypeptides having a predetermined binding specificity can be 
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accomplished by applying the rules for zmc finger binding specificity set forth herein. In 
the case of zinc finger assembly under computer control, a rule table may be used to 
select zinc fingers for binding to the target site. Figure 1 shows a flowchart depicting part 
of the logic used in the selection of zinc fingers firom a natural library in accordance with 
5 the invention. The logic set forth in Figure 1 may be supplemented, for example using 
Rules relating to zinc finger overlap. Functional testing of zinc fingers for binding to the 
desired binding site may be implomented in an automated fashion and integrated with the 
zinc finger design system. 



The invention thus provides Ubraries of zinc finger modules. In one embodiment, 
10 the modules are human zinc finger modules. Preferably, the modules are DNA-binding 
zinc finger modules. 

In a preferred aspect the invention provides a library of DNA-bindmg human zinc 
finger modules as set out in Example 1 below. Moreover, the invention provides a library 
of human zinc finger modules as set forth in Example 2 below. Sub-hbraries can be 
1 5 prepared fix)m either of the libraries of the invention. 

The invention furthermore encompasses Ubraries in which zinc finger modules as 
set forth in Examples 1 or 2 herein are combined with other zinc finger modules to 
provide fiuther libraries that may be used to generate zinc finger polypq)tides. 



In a still fiuther aspect, the invention relates to libraries derived firom animals 
20 other than humans, for use in said organisms in order to derive some or all of the same 
advantages as may be obtained with himian zinc fingers for use in humans. Example 3 
sets forth databases of zinc fingers firom mouse, chicken and plants. Sequences of zinc 
fingers can be identified in other organisms by the same means, Le. by analysis of 
sequence information and identification of zinc fingers in accordance with the guidance 
25 given herein. 
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EXAMPLES 

Example 1. List of selected human DNA-binding zinc fingers. 

These fingers have been selected firom the human genome on the basis of a prediction that 
5 they have a DNA-binding potential. This prediction is based on coded contacts (WO 
96/06166, WO 98/53057, WO 98/53058; WO 98/53059 and WO 98/53060); 
accordingly, for each peptide unit, a 3-nucleotide DNA target subsite is shown, as the 
preferred sequence to which the zinc finger binds. Hence, by constructing 2- or 3-finger 
libraries from these 200 or so units, in the manner described in the Examples infra^ there 
10 exists the potential to screen a large variety of novel DNA target sites. Note that the 

predicted DNA target subsites listed below are merely intended to be a guide to the DNA- 
binding potential. It is anticipated that, in practice, an even wider range of DNA 
sequences can be targeted using a library engineered from this database, through the 
exertion of a positive selection pressure in the hbrary screening system. 

15 

The fingers listed below are in a format tliat can be linked with classical wild-type 
canonical "TGEKP" (SBQ ID N0:3) linkers (i.e. . . .TGEKP - zinc finger peptide 
sequence — TGEKP - zinc finger peptide sequence - TGEKP - etc. . For each peptide 
sequence, an oligonucleotide is designed to encode the peptide sequence; the 
20 oligonucleotide can then be linked into a library selection system, as described in the 
Examples infra. 

Database of predicted human DNA-binding zinc fingers 

25 227 finger units 



Zinc finger 


DNA site 


SEQ ID Peptide sequence 






NO 


ZIF268 Fl 


GCG 


3 1 YACPVESCDRRFSRSDELTRHIRIH 


ZIF268 F2 


TGG 


32 FQCRI CMRNFSRSDHLSTHIRTH 


ZIF268 F3 


GCG 


33 FACDICGRKFARSDERKRHTKIH 


Kr-likel3 


NGT 


3 4 HKCHYAGCEKVYGKS SHLKAHLRTH 


MAZ Fl 


AGG 


3 5 YQCPVCQQRFKRKDRMS YHVRSH 
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MAZ F2 


TGG 


36 


YNCSHCGKSFSRPDHLNSHVRQVH 


MAZ F3 


NGT 


37 


FKCEKCEAAFATKDRLRAHTVRH 


TIEG2 (SP1)F3 


GGG 


38 


FVCPVCDRRFMRSDHLTKHARRH 


SPl Fl 


GGG 


39 


HKCHYAGCEKVYGKS SHLKAHLRTH 


SPl F2 


GCG 


40 


FACSWQDCNKKFARSDELARHYRTH 


SPl F3 


GGG 


41 


FSCPICEKRFMRSDHLTKHARRH 


WTl Fl 


TGT 


42 


FMCAYPGCNKRYFKLSHLQMHSRKH 


WTl F2 


GAG 


43 


YQCDFKDCERRFSRSDQLKRHQRRH 


WTl F3 


TGG 


44 


FQCKTCQRKFSRSDHLKTHTRTH 


WTl F4 


GCG 


45 


FSCRWPSCQKKFARSDELVRHHNMH 


TYYl 


TAT 


46 


FQCTFEGCGKRFSLDFNLRTHVRI H 


TYYl 


NAA 


47 


YVCPFDGCNKKFAQSTNLKSHILTH 


TF3A 


GGG 


48 


FVCDYEGCGKAFIRDYHLSRHILTH 


TF3A 


GGC 


49 


FKCTQEGCGKHFASPSKLKRHAKAH 


MAZ 


GGC 


50 


HACEMCGKAFRDVYHLNRHKLSH 


GLIl 


GCA 


51 


YMCEHEGCSKAFSNASDRAKHQNRTH 


ZIC3 


GCA 


52 


FKCEFEGCDRRFANSSDRKKHMHVH 


SP4 


NGG 


53 


HI CHIEGCGKVYGKTSHLRAHLRWH 


SP2 


NTG 


54 


HVCHI PDCGKTFRKTSLLRAHVRLH 


BTEl 


NGG 


55 


HKCPYSGCGKVYGKSSHLKAHYRVH 


GLI2 


TAG 


56 


HKCTFEGCS KAYSRLENLKTHLRSH 


Q14872 


TAT 


57 


YQCTFEGCPRTYSTAGNLRTHQKTH 


Q14872 


TGC 


58 


FRCDHDGCGKAFAASHHLKTHVRTH 


ZIC3 


TAG 


59 


FPCPFPGCGKI FARSENLKIHKRTH 


Z143 


CTT 


60 


FKCPFEGCGRSFTTSNIRKVHVRTH 


Z143 


CGT 


61 


FRCEYDGCGKLYTTAHHLKVHERSH 


000153 


AAT 


62 


FMCHESGCGKQFTTAGNLKNHRRI H 


Z143 


AAC 


63 


YYCTEPGCGRAFASATNYKNHVRIH 


Q14872 


TCT 


64 


FVCNQEGCGKAFLTSHSLRIHVRVH 


000153 


TGT 


65 


F I CPAEGCGKS F YVLORLKVHMRTH 


Q14872 


GCT 


66 


FNCESEGCSKYFTTLSDLRKHIRTH 


Zl 43 




67 


YRCSEDNCTKS PKTSGDTjOKHTRTH 


BTBl 


GCG 


68 


FPCTWPDCLKKFSRSDELTRHYRTH 


015391 


TAA 


69 


FVCPFDVCNRKFAQSTNLKTHILTH 


Z143 


GNC 


70 


YVCTVPGCDKRFTEYSSLYKHHWH 


043591 


GGT 


71 


HVCEHCNAAFRTNYHLQRHVFIH 


BCL6 


TAG 


72 


YRCNI CGAQFNRPANLKTHTRI H 


075626 


TAG 


73 


HECQVCHKRFSSTSNLKTHLRLH 


075626 


YAA 


74 


YECNVCAKTFGQIiSNLKVHLRVH 


BCL6 


NGA 


75 


YKCETCGARFVQVAHLRAHVLIH 
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075626 


GGA 


76 


FKCQTCNKGFTQLAHLQKHYLVH 


ZN45 


N(N/T)A 


77 


YRCDVCGKRFRQRSYIjQAHQRVH 


BCL6 


YTY 


78 


YPCEICGTRFRHLQTLKSHLRIH 


GFIl 


GCA 


79 


YPCQYCGKRFHQKSDMKKHTFIH 


Z263 


GAN 


80 


YQCNICGKCFSCNSNLHRHQRTH 


ZN75 


TAY 


81 


YRCSWCGKSFSHNTNLHTHQRIH 


Z186 


TTT(YYY) 


82 


YKC I E CGKT F T VNQLLTLHHRTH 


Z136 


TTT (YYY) 


83 


FKCKQCGKAFSCSPTLRIHERTH 


Z136 


TGA 


84 


YKCKVCGKAFDYPSRFRTHERSH 


Z136 


TTT (YYY) 


85 


YKCKVCGKPFHSLSSFQVHERIH 


Z177 


TTA 


86 


YECKECGKAFRNSSCLRVHVRTH 


Z136 


TNN 


87 


FECKRCGKAFRSSSSFRLHERTH 


060765 


A/T-YT 


88 


YRCNECGKGFTS I SRIiNRHRI IH 


ZN42 


TYT 


89 


YHCGECGLGFTQVSRLTEHQRIH 


ZN42 


COG 


90 


FVCGDCGQGFVRSARLEEHRRVH 


014913 


TCG 


91 


YKCEKCGKGFFRS SDLQHHQKIH 


014913 


C-G/T-G 


92 


YKCEECGKGFSRS SKLQEHQT IH 


ZN45 


YYC 


93 


YKCEECGKGFCRASNLLDHQRGH 


2asi45 


AAA 


94 


YKCEECGKGFSQASNLLAHQRGH 


ZN45 


NAG 


95 


YQCEECGKGFCRASNFLAHRGVH 


Z239 


YYG 


96 


YKCEQCGKGFTRSSSLLIHQAVH 


094892 


YNY 


97 


YRCSECGKGFIVNSGIiMLHQRTH 


ZN45 


AAY 


98 


YQCAECGKGFSVGSQLQAHQRCH 


ZN45 


NGY 


99 


YKCEECGKGFS VGSHLQAHQ I SH 


ZN45 


YCG 


100 


YQCDACGKGPSRSSDPNIHFRVH 


ZN45 


CCG 


101 


YKCGTCGKGFSRS SDLNVHCRIH 


ZN45 


TGA 


102 


YKCNACGKSFSYSSHIiNIHCRIH 


Z239 


TCA 


103 


YQCYECGKGFSQSSDLRIHLRVH 


Z239 


YAA 


104 


YKCGECGKGFSOSSNLHIHRCIH 


Z239 


YGA 


105 


YKCDKCGKGFSQSSKLHIHQRVH 


Z239 


CGA 


106 


YHCGKCGKGPSQS SKLLIHQRVH 


060765 


AYA 


107 


FKCSECGRAFSOSASIiIOHERIH 


060792 


GYY 


108 


YECKECGKAF I RS S SLAKHERI H 


ZN07 


ATA 


109 


YPCKECGKAFSQSSTLAQHQRMH 


043296 


AYY 


110 


YKCSECGKAFSRSSSLTQHQRMH 


Z134 


ATG 


111 


YKCSECGKAFSRKDTLVQHQRIH 


Z134 


ATG 


112 


YECSECGKAFSRKATLVQHQRIH 


ZN84 


AYC 


113 


YECSEC6KAFSEKLSLTNHQRIH 


Z191 


AYG 


114 


YGCVECGKAFSRS S ILVQHQRVH 


ZN24 


ACG 


115 


YGCVECGKAFSRSSILVQHQRVH 
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043338 


GTA 


116 


YVCGQCGKS FSQRATL I KHHRVH 


043339 


GTA 


117 


YECSQCGKSPSQKATLVKHQRVH 


043338 


AYA 


118 


YDCGQCGKSFIQKSSLIQHQWH 


043339 


ANA 


119 


YECGQCGKSFSQKSGLIQHQWH 


043338 


CAA 


120 


YECGECGKSFSQSSNLIEHCRIH 


Q13398 


AAA 


121 


YECGECGKSFSQRSNLMQHRRVH 


Z135 


CYA 


122 


YECGECGKAFSQSTLLTEHRRIH 


Q13398 


ACA 


123 


YECSECGKSFSQSSSLIQHRRVH 


014709 


AAA 


124 


YKCNECGKAFSQSAYLLNHQRIH 


014709 


CAA 


125 


YKCNECGKVFSQNAYLIDHQRLH 


014709 


CAA 


126 


YKCTECGKAFTQSAYLFDHQRLH 


014709 


CAA 


127 


YKCDECGKTFAQTTYLIDHQRLH 


060792 


AAA 


128 


YNCNECRKTF SQS T YL I QHQR I H 


015535 


ANA 


129 


YHCKECGKVFSQSAGLIQHQRIH 


Q15776 


(a) 


TNA 


130 


YHCKECGKAFSQNTGLILHQRIH 


Q15776 


(b) 


TNA 


131 


YQCNQCGKAFSQSAGLILHQRIH 




CNA 


132 


YKCNECGRAFSQKSGLIEHQRIH 


ZN84 


AAC 


133 


YGCNECGRAFSEKSNLINHQRIH 


Z191 


ANA 


134 


YKCLECGKAFSQNSGLINHQRIH 


ZN24 


ANA 


135 


YKCLECGKAFSQNSGLINHQRIH 


O60765 


AYA 


136 


YRCEECGISFGQSSALIQHRRIH 


ZN07 


YYA 


137 


YRCEECGKAFGQSSSLIHHQRIH 


043340 


ACA 


138 


YECDECGKS YSQS SALLQHRRVH 


Z135 


CYY 


139 


YKCQECGKAFSHSSALIEHHRTH 


043340 


AYA 


140 


YDCSECGKSFRQVSVLIQHQRVH 


043340 


AYA 


141 


YVCSECGKSFGQKSVLIQHQRVH 


Q13398 


AYT 


142 


YQCSQCGKSFGCKSVLI QHQRVH 


015535 


GNA 


143 


HKCDECGKS FTQS SGL IRHQRI H 


Q15776 


GNA 


144 


HKCDECGKSFAQSSGLVRHWRIH 


075802 


ANG 


145 


HKCEBCGKAFSRS SGL I QHQR I H 


Z189 


ANG 


146 


HKCEECGKAFSRSSGLIQHQRIH 


075802 


ANG 


147 


HKCDECGKAF SRNS GL I OHOR 1 H 


Q13398 


YYG 


148 


HECNECGKSFSRSSSLIHHRRLH 


Z195 


YAA 


149 


YKCDECGKNFTQSSNLIVHKRIH 


043309 


CYA 


150 


YKCDKCGKAFTQRSVLTEHQRIH 


Z195 


CGA 


151 


YKCDECGKAYTQSSHLSEHRRIH 


ZN45 


YYA 


152 


YKCERCGKAFSQFSSLQVHQRVH 


O60893 


YYN 


153 


YECEDCGKTFIGSSALVIHQRVH 


ZN07 


TAT 


154 


YECLQCGKAFSMSTQLTIHQRVH 


060893 


CYA 


155 


YECDDCGKTFSQSCSLLEHHKIH 
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Q15776 


NGG 


156 


YECDECGKTFRRS SHL I GHQRSH 


ZN84 


YGG 


157 


YECGECGKAFSRKSHLI SHVJRTH 


Z177 


YGA 


158 


YECDHCGKSFSQSSHLNVHKRTH 


043296 


AYG 


159 


YECMECGKAFNRKSYLTQHQRIH 


043296 


GNG 


160 


YECVECGKAFTRMSGLTRHKRIH 


043340 


AGG 


161 


YECRECGKSFTRKNHLIQHKTVH 


Z134 


AAG 


162 


YECSECGKTFSRKDWLTQHKRIH 


043338 


CGA 


163 


YECSECGKSFSQTSHLNDHRRIH 


075467 


AGA 


164 


YECAQCGKAFSQTSHLTQHQRIH 


Z135 


AGA 


165 


YECSECGKAFRQSIHLTQHLRIH 


Z135 


AGA 


166 


YECHDCGKSFRQSTHLTQHRRIH 


Z205 


AGG 


167 


YACTDCGKRFGRSSHLIQHQI IH 


043296 


AGG 


168 


YECTECGKTFIKSTHLLQHHMIH 


075290 


AAG 


169 


YECKECGKYFSRSANLI QHQS IH 


075290 


AGG 


170 


YECKECGKGFNRGAHLIQHQKIH 


075290 


AGG 


171 


YECKECGKGFNRGAHLIQHQKIH 


060792 


CGA 


172 


YTCNECGKAFSQRGHFMEHQKIH 


075123 


CGA 


173 


YTCDOCGKGFGOS SHLMEHORIH 


043337 


GYA 


174 


YECNACGKAFSQS STLI RHYL IH 


075802 


GYY 


175 


YECNYCGKTFSVS STLI RHORIH 


Z165 


GGY 


176 


YECSECGKTFRVSSHLIRHFRIH 


2124 


CYY 


177 


YVCNNCGKGFRC S S S LRDHERTH 


Z135 


AYY 


178 


YGCNECGKTFSHS S S LS OHERTH 


\y J. ^ ^ o -L 








075123 


AAA 


180 




Q13398 


AAY 


181 


YVCGECGKSFSHS SNLKNHORVH 


ZN35 


YYA 


182 


YTCNECGKAFRORS SLTVHORTH 


2157 


YYC 


183- 


YE CTECGKTF S E KATLT I HORTH 


04 3 3 3 8 


GYY 


184 


YECDECGKAFGSKS TLVRHORTH 


ZN84 


TYC 


185 


YECSECGK7VFGEKS SLATHORTH 


ZN07 


GAA 


186 


YGCRECGKAFSOOSOLVRHORTH 


ZN84 


YAA 


187 


YNCSOCGKAF SOKSOTjT SHORTH 

xxx V-I»tj\y v^^Ji.vxM. I—' ^ xvkj xj x it^xxv^xv<L xi 


2186 


YGY 


188 


YACDHCEKAFSHKS KLT VHORTH 

XX^>M*J<<'XXV^XJX\fx^ la^XXX X VXJ X V XX\j/XV X XX 


043338 


GGC 


189 


YVCGECGKAFMFKS KLVRHQRTH 


OZF 


YYA 


190 


YECNVCGKAFSQS S SLTVHVRSH 


095779 


YYY 


191 


YKCKECGKAFNHCSLLTIHERTH 


Z135 


GYY 


192 


YACRDCGKAFTHS S S LTKHQRTH 


ZN80 


GYA 


193 


YECKECGKGFYYSYSLTRHTRSH 


Z177 


GYC 


194 


YECSDCGKAFIDQSSLKKHTRSH 


Z177 


GYY 


195 


YDCKECGKAFTVPSSLQKHVRTH 
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043337 


ACT 


196 


YDCMACGKAFRCSSELIQHQRIH 


Q14585 


AGY 


197 


YECKECEKAFRSGSKLIQHQRMH 


Q14585 


AAY 


198 


YECIDCGKAFGSGSNLTQHRRIH 


Q14585 


GYY 


199 


YECKACGMAFSSGSALTRHQRIH 


Q14585 


AYY 


200 


YECKECGKAFYSGS SLTQHQRIH 


Q14585 


AAY 


201 


YECKECGKAFGSGANIiAYHQRIH 


Q14585 


GAY 


202 


FECKECGKAFGSGSNLTHHQRIH 


Q14585 


ACY 


203 


YVCKECGKAFNS6SDLTQHQRIH 


060792 


ACY 


204 


YQCHECGKTFSYGSSLIQHRKIH 


060893 


GNA 


205 


HYCHECGKSFAQSSGLTKHRRIH 


Z165 


GCC 


206 


YECNECGKSFAESSDLTRHRRIH 


060893 


GAY 


207 


YECEECGKVFSHSSNLIKHQRTH 


Q15776 


NGY 


208 


YECNECGKAFSHSSHLIGHQRIH 


Z135 


GYY 


209 


YQCGECGKAFSHSSSLTKHQRIH 


Z165 


GGY 


210 


HQCNECGKAFRHSSKLARHQRIH 


Z135 


TYG 


211 


YECHECLKGFRNSSALTKHQRIH 


043361 


YGC 


212 


YECNECGKFFLDSYKLVIHQRIH 


043361 


YGC 


213 


YECSECGKFFRDSYKLI IHQRVH 


Z140 


YYG 


214 


YGCHECGKTFGRRFSLVLHQRTH 


060792 


AAA 


215 


YECNECGKAFSQHSNLTQHQKTH 


Z135 


ANA 


216 


YKCTQCGRTFNQIAPLIQHQRTH 


Z135 


ANA 


217 


YECNQCGRAFSQLAPLI QHQRI H 


Z135 


ANA 


218 


YECHECGKAFTQITPLIQHQRTH 


043309 


AGA 


219 


YKCNECGKAFGRWSALNQHQRLH 


ZN83 


AGA 


220 


YKCNECGKVFHNMS HLAQHRR I H 


ZN83 


AGY 


221 


YRCNVCGKVFHHISHLAQHQRIH 


ZN83 


AGA 


222 


YKCNECGKVFNQI SHLAQHQRI H 


014709 


CAY 


223 


FECSECGRAFSSNRNLIEHKRIH 


ZN74 


GYA 


224 


YKCSECGRAFSQNHCLIKHQKIH 


Q13398 


ANA 


225 


YECSECGKSPSQNPSLIYHQRVH 


075123 


GYA 


226 


FECKECGKGFSQSSLLIRHQRIH 


Z132 (a) 


GGA 


227 


FECSECGRDFSOSSHLLRHOKVH 


Z132 


GYA 


228 


YECNECGKFFSQNS I L I KHQKVH 


Z132 (b) 


GGA 


229 


YECDECGKAFSNRSHLIRHEKVH 


Z132 


GGN 


230 


YECSECGRAFSSNSHLVRHQRVH 


Z132 


AAA 


231 


YECSECGRAFNNNSNLAQHQKVH 


Z134 


ATY 


232 


YKCSDCGKVPRHKSTLVQHES IH 


075290 


AAT 


233 


YECKECGKAFRLYLQLSQHQKTH 


Z157 


AYC 


234 


YECGECGKNFRAKKSLNQHQRIH 


Z157 


TTT 


235 


YECGECGKFFRMKMTLNNHQRTH 
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ZN07 


AAT 


236 


YECAECGKVFRLCSQLNQHQRIH 


Z157 


AYT 


237 


YECSECGKIFSMKKSLCQHRRTH 


043361 


GGY 


238 


YECNKCGKFFMYNSKLIRHQKVH 


043361 


GTY 


239 


YKCSKCGKFFRYRCTLSRHQKVH 


Z157 


CGY 


240 


YECaSTECGNAFYVKARLIEHQRMH 


Z157 


CGY 


241 


YECSECGNAFYVKVRLIEHQRIH 


075123 


AGO 


242 


FECNECGKAF I RS S KL I QHQRI H 


ZN07 


ACT 


243 


FKCTECGKAFRLSSKLIQHQRIH 


075123 


GYT 


244 


YECNECGKAFFLSSYLIRHQKIH 


075802 


AAT 


245 


HKCGECGKAFRLSTYLIQHQKIH 


Z174 


GCG 


RNA 


246 


YKCDDCGKS FTWNSELKRHKRVH 


Z202 


GCG 


RNA 


247 


YRCDDCGKHFRWTSDLVRHQRTH 


043345 


GTG 


RNA 


248 


YKCEECGKAYKWPSTLSYHKKI H 


043345 


CA? 


RNA 


249 


YKCEECGKAFNWSSNLMEHKKIH 


075346 


TAA 


250 


YRCEECGKAFNQSANLTTHKRIH 


ZN43 


TAA 


251 


YKCEECGKAFTQSSNLTTHKKIH 


ZN85 


GGA 


252 


YKCEECGKAFNQSSKLTKHKKIH 


ZN85 


GAA 


253 


YTCEECGKAFNQSSNLTKHKRIH 


Q02313 


GAA 


254 


YKCEECGKAFNQLSNLTRHKVIH 


Q02313 


CAA 


255 


YKCEECGKAFKQFSNLTDHKKIH 


Z141 


GTG 


256 


YKCEECGKAFNRSTTLTKHKRI H 


ZN91 


TTG 


257 


YKCEECGKAFSRS STLTKHKTl H 
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Example 2: List of all human C2H2 zinc fingers 

This list represents an even more comprehensive database of human zinc fingers, 
including those with non-DNA-binding activities such as those mediating protein-protein 
interactions and those involved in RNA binding. By including fingers firom this database 
5 into a natural finger selection system as disclosed herein, many new 2dnc finger proteins 
having unique target specificities can be obtained. All of these peptides would 
necessarily possess properties required for potential therapeutic agents, such as non- 
immunogenicity. 

10 The fingers listed below are in a format that can be hnked with classical canonical 

'TGEKP" linkers (i.e. . . .TGEKP - zinc finger peptide sequence - TGEKP - zinc finger 
peptide sequence ~ TGEKP - etc. . .). For each peptide sequence, an oligonucleotide is 
designed to encode the peptide sequence; the oligonucleotide can then be linked into a 
library selection system, as described in the Examples infra, 

15 



Human zinc finger database 




968 fiinger units 






Name 


■ SEQ ID NO 


Peptide sequence 


Q92981_HUMAN 


258 


HQCAHCEKTFNRKDHLKNHFQTH 


O76019_HUMAN 


259 


HQCAHCEKTFNRKDHLKNHLQTH 


ZPY_HUMAN 


260 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFX_HUMAN 


261 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFX_BOVIN 


262 


HRCEYCKKGFRRPSEKNQH IMRH 


Q15558_HUMAN 


263 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFX_HUMAN 


264 


HKCDMCDKGFHRPSELKKHVAAH 


ZFY_HUMAN 


265 


HKCEMCEKGFHRPSELKKHVAVH 


Q15558_HUMAN 


266 


HKCEMCEKGFHRPSELKKHVAVH 


Z161_HUMAN 


267 


YTCSVCGKGFSRPDHLSCHVKHVH 


MAZ_HUMAN 


268 


YNCSHCGKSFSRPDHLNSHVRQVH 


043829_HUMAN 


269 


YSCEVCGKSPIRAPDLKKHERVH 


O00403_HUMAN 


270 


YSCEVCGKS P I RAPDLKKHERVH 


Zl 5 INHUMAN 


271 


HKCPHCDKKPNQVGNLKAHLKIH 


Q92618 HUMAN 


272 


YKCPYCDHRASQKGNLKIHIRSH 


ZFX_HUMAN 


273 


FRCKRCRKGFRQQSELKKHMKTH 


014*52 6_HUMAN 


274 


YPCTICGKKFTQRGTMTRHMRSH 


HKR3_HUiyiAN 


275 


FECTECGYKFTRQAHLRRHMEIH 


Q14526_HUMAW 


276 


YACDACGMRFTRQYRLTEHMRIH 


075626_HUMAN 


277 


YECNVCAKTFGQLSNLKVHLRVH 


CTCF_HUiyiAN 


278 


HKCPDCDMAFVTSGELVRHRRYKH 


075701 HUMAN 


279 


YSCPDCSLRFAYTSLLAIHRRIH 



wo 02/099084 



PCT/US02/22272 



77 



O75701_HUMAN 


280 


YACSDCKSRFTYPYLLAIHQRKH 


043167_HUMAN 


281 


YACKDCGKVFKYNHFLAIHQRSH 


075850_HUMAN 


282 


CACPDCGRSFTQRAHMLLHQRSH 


O75850_HUMAN 


283 


YACPDCGRGFSHGQHLARHPRVH 


ZN42_HUMAN 


284 


FVCGDCGQGFVRSARLEEHRRVH 


075467_HUMAN 


285 


FRCVDCGKAFAKGAVLLSHRRIH 


O15015_HUMAN 


286 


YKCSECGRAYRHRGSLVNHRHSH 


O75701_HUMAN 


287 


YPCPDCGRRFRQRGSLAIHRRAH 


Q92 95 INHUMAN 


288 


YECAI CQRSFRNQSNLAVHRRVH 


BCL6_HUMAN 


289 


YKCDRCQASFRYKGNLAjSHKTVH 


ZN42_HUMAN 


290 


YACQDCGRRFHQSTKLIQHQRVH 


O75701_HUMAN 


291 


YPCPDCGRRFTYSSLLLSHRRIH 


O75701_HUMAN 


292 


HVCTDCGRRFTYPSLLVSHRRMH 


0757 0 INHUMAN 


293 


HSCPDCGRNFSYPSLLASHQRVH 


ZN42_HUMA]Sr 


294 


YACVECGERFGRRSVIjLQHRRVH 


043298_HUMAN 


295 


YGCGVCGKKFKMKHHLVGHMKI H 


O15209_HU]y[AN 


296 


YDCPVCNKKFKMKHHLTEHMKTH 


043829_HUMAN 


297 


YACHMCDKAFKHKSHLKDHERRH 


O00403_HUMAN 


298 


YACHMCDKAFKHKSHLKDHERRH 


060315 HUMAN 


299 


HQCQICKKAFKHKHHLIEHSRLH 


Q12924_HUMAlSr 


300 


HECGICKKAFKHKHHLIEHMRLH 


NIL2_HUMAN 


301 


HECGI CKKAFKHKHHLI EHMRLH 


Q12924_HUMAN 


302 


FKCTECGKAFKYKHHLKEHLRIH 


O60315_HUMAN 


303 


FKCTECGKAFKYKHHLKEHLRIH 


NIL2 HUMAN 


304 


FKCTECGKAFKYKHHLKEHLRIH 


O95780_HUMAN 


305 


YKCEECGKAFKRCSHLNEHKRVQ 


095779 HUMAN 


306 


YKCEECGKAFKRCSHLNEHKRVQ 


043296 HUMAN 


307 


FKCSECGKVFNKKHLLAGHEKIH 


014709 HUMAN 


308 


YKCKECGKGFYRHSGLI IHLRRH 


014709 HUMAN 


309 


HKCKECGKGF I QRS SLLMHLRNH 


ZN80 HUMAN 


310 


CKCVECGKVFNRRSHLLCYRQIH 


043337_HUMAN 


311 


YKCIECGKAFKRRSHLLQHQRVH 


O60765_HUMAN 


312 


YI CKECGKAFTLSTSLYKHLRTH 


Z136_HUMAN 


313 


FECKRCGKAFRSSSSFRLHERTH 


Z136_HUMAN 


314 


F VCKQCGKAFRSASTFQ IHERTH 


Z136_HUMAN 


315 


YVCKHCGKAFVSSTSIRIHERTH 


Z136 HUMAN 


316 


FKCKQCGKAFSCSPTLRIHERTH 


Z124 HUMAN 


317 


YVCNNCGKGFRCS SSLRDHERTH 


Z177^HUMAN 


318 


YECKECGKAFRNSSCLRVHVRTH 


Z124_HUMAN 


319 


YECKHCGKAFRYSNCLHYHERTH 


O95780_HUMAN 


320 


YKCKECGKAFNHCSLLTIHERTH 


095779_HUMAN 


321 


YKCKECGKAFNHCSLLTIHERTH 


Z124_HUM7^ 


322 


YPCKQCGKAFRYAS SLQKHEKTH 


Z136_HUMAN 


323 


YECKQCGKAFSYLNSFRTHEMIH 


Z136_HUMAN 


324 


YECKQCGKAFSYLPSLRLHERIH 


O15060_HUMAN 


325 


YSCKVCGKRPAHTSEFNYHRRI H 


Z136__HUMAN 


326 


YKCKVCGKPFHSLSPFRIHERTH 


Z136_HUMAN 


327 


YKCKVCGKPPHSLSSFQVHERIH 
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Z136_HUMAN 328 

ZN35_HUMAN 329 

015322_HUMAN 330 

Q92951_HUMAN 331 

Q92951_HUMA]Sr 332 

Q92 95 INHUMAN 333 

OZF_HUMAN 334 

OZF_HUMAN 335 

ZN07_HUMAN 336 

Z151_HUM2:USr 337 

Z177_HUMAN 338. 

OZF_HUMAN 33 9 

Z177_HUMAN 340 

Z177_HUMAN 341 

O60792_HUMAN 342 

Z161_HUMAN 343 

Z161_HUMAN 344 

MAZ_HUMAN 345 

O60792_HU]yiAN 346 

O60792_HUMAN 347 

Z263_HUMAN 348 

Z263_HUMAN 349 

Z135_HUMAN 350 

Z135_HUMAN 351 

Z135_HUiyiAN 352 

075467__HUM;^ 353 

ZN07_HUMAN 354 

O95270_HUMAN 355 

GFI INHUMAN 356 

O75850_HUMAN 357 

Q15552_HUMAN 358 

043591_HUMAN 359 

Q15552_HUMAN 360 

043591_HUMAN 361 

O75850_HUiyiAN 362 

O75850_HUMA]SI 363 

094892_HUMAN 364 

043336_HUMAN 365 

043167_HUMAN 366 

043167_HUMAN 367 

PLZF_HUMAN 368 

HKR3_HUMAN 369 

043336_HUMAN 370 

043336_HUMAN 371 

Z134_HUMAN 372 

Z200_HU]yiAN 373 

015361_HUMAN 374 

ZN84 HUMAN 375 



78 

YKCKVCGKAFDYPSRFRTHERSH 

YVCNECGKAFTCSSYLLIHQRIH 

YNCKECGKSFRWSSYLLIHQRIH 

YRCDQCGKAFSQKGSLIVHIRVH 

YQCKECGKSFSQRGSLAVHERLH 

YECQECGKSFRQKGSLTLHERIH 

YECNECGKAFSQRTSLIVHVRIH 

YECNVCGKAFSQSSSLTVHVRSH 

YVCNDCGKAFSQSSSLIYHQRIH 

CQCVMCGKAFTQASSLIAHVRQH 

YDCKECGKAFTVPSSLQKHVRTH 

FECKDCGKAFIQKSNLIRHQRTH 

YECSDCGKAFIDQSSLKKHTRSH 

YECSDCGKAFI FQSSLKKHMRSH 

YECKECGKAFIRSSSLAKHERIH 

YACTYCSKAFRDSYHLRRHESCH 

HACEMCGKAFRDVYHLNRHKLSH 

HACEMCGKAFRDVYHLNRHKLSH 

FKCDECDKTFTRSTHLTQHQKIH 

YKCNECDKAFSRSTHLTEHQNTH 

YKCNECGKSFRQGMHLTRHQRTH 

HKCLECGKCFSQNTHLTRHQRTH 

YECSQCGKAFRQSTHLTQHQRIH 

YECHDCGKSFRQSTHLTQPIRRIH 

YECSECGKAFRQS IHLTQHLRIH 

YECAQCGKAFSQTSHLTQHQRIH 

YECLQCGKAFSMSTQLTIHQRVH 

YPCQFCGKRFHQKSDMKKHTYIH 

YPCQYCGKRFHQKSDMKKHTFIH 

FPCTECEKRFRKKTHLIRHQRIH 

FRCDECGMRS IQKYHMERHKRTH 

FRCDECGMRFIQKYHMERHKRTH 

FQCSQCDMRFIQKYLLQRHEKIH 

FQCSQCDMRFIQKYLLQRHEKIH 

FPCSECDKRFSKKAHLTRHLRTH 

YPCAECGKRFSQKIHLGSHQKTH 

FMCSECGKGFTMKRYLIVHQQIH 

YQCSECGKSFIYKQSLLDHHRIH 

FKCNECGKGFAQKHSLQVHTRMH 

YTCDQCGKYFSQNRQLKSHYRVH 

YECNGCDKKFSLKHQLETHYRVH 

YACPTCHKKFLSKYYLKVHNRKH 

YVCNVCGKSFRHKQTFVGHQQRIH 

YVCNI CGKSFLHKQTLVGHQQRIH 

YDCSDCGKSFGHKYTLIKHQRIH 

YDCNHCGKSFNHKTNLNKHERIH 

YDCNHCGKSFNHKTNLNKHERIH 

YDCNHCGKAFSRKSQLVRHQRTH 
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ZN84_HUMAN 376 

ZN07_HUiyiAN 3 77 

ZN84_HUMAN 378 

ZN84_HUMAN 379 

ZN84_HUMAN 3 80 

ZN84_HUiyiAN 381 

ZN84_HUMAN 3 82 

Z157_HUiyiAN 383 

ZN84_HUMAN 384 

ZN84_HUMAN 385 

Z136_HUMAN 386 

Z136_HUMAN 387 

Z136_HU]yiAN 388 

ZN80_HUMAN 389 

043338_HUMAN 390 

043338_HUiyiAN 391 

Z133_HUMAN 392 

Z133_HUMAN 393 

Z133_HUMAN 394 

Z133___HUMAN 395 

Z133_HUMAN 396 

Z133_HUMAN 397 

Z133_HUMAN 398 

Z133_HUMAN 399 

094892_HUMAN 400 

094892_HUMAN 401 

094892_HUMAN 402 

094892_HUMAN 403 

094892_HUMAN 404 

094892_HUMAN 405 

094892_HUMAN 406 

094892__HUMAN 407 

094892_HUMAJSr 408 

094892__HUMAN 409 

Z186_HUMAN 410 

Z186_HUMAN 411 

Z186_HUMA]Sr 412 

ZN35_HUiyiAN 413 

ZlSe^HUMAN 414 

Z157_HUMAN 415 

Z186_HUMAN 416 

Z157_HUMAN 417 

ZN84_HUMAN 418 

ZN84_HUiyiAN 419 

Z140__HUMAN 420 

ZN84_HUMAN 421 

ZN84_HUMA]Sr 422 

ZN84 HUMAN 423 
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FECRECGKAFSRKSQLVTHHRTH 
YGCRECGKAFSQQSQLVRHQRTH 
YRCIECGKAFSQKSQLINHQRTH 
YGCSECRKAFSQKSQLVNHQRIH 
HGCIQCGKAFSQKSHLISHQMTH 
YNCSQCGKAFSQKSQLTSHQRTH 
YVCSECGKAFCQKSHLI SHQRTH 
FECNE CGKS FGRKSQL I LHTRTH 
FECSECGKAFSRKSHLIPHQRTH 
YECGECGKAFSRKSHLI SHWRTH 
YHCKECGKAYSCRASFQRHMLTH 
YECKECGEAFSCI PSMRRHMI KH 
• YECQECGKAFTCITSVRRHMIKH 
YECQECGKAFPEKVDFVRHMRIH 
YVCGECGKAFMFKSKLVRHQRTH 
YECDECGKAFGSKSTLVRHQRTH 
YACGECGRGFSQKSNLVAHQRTH 
YMCSECGRGFSQKSNLI IHQRTH 
YACKDCGRGFSQQSNLIRHQRTH 
YACSDCGLGFSDRSNLISHQRTH 
YACRECGRGFNRKSTLIIHERTH 
YVCRECGRGFSHQAGLIRHKRKH 
CVCRECGQGFLQKSHLTLHQMTH 
YVCRECGKGFSQKSAWRHQRTH 
YICSECGKGFPRKSNLIVHQRNH 
YI CNECGKGFPGKRNLI VHQRNH 
YTCSECGKGFPLKSRLIVHQRTH 
YICSECGKGFTTKHYVI IHQRNH 
YICSECGKGFTGKSMLI IHQRTH 
YLCSECGKGFTVKSMLIIHQRTH 
YGCNECGKGFTMKSRL I VHQRTH 
YI CNECGKGFTMKSRMIEHQRTH 
FICSECGKVFTMKSRLIEHQRTH 
YI CNECGKGFAFKSNL WHQRTH 
YECNECGKTFHQKSFLTVHQRTH 
YECNELGKTFHCKSFLTVHQKTH 
YGCNECGKTVRCKSFLTLHQRTH 
YTCNECGKAFRQRSSLTVHQRTH 
YQCSECGKTFSQKSYLTIHHRTH 
YECSECGKTFRVKI SLTQHHRTH 
YKCIECGKTFTVNQLLTLHHRTH 
YECTECGKTFSEKATLTIHQRTH 
YACSDCRKAFFEKSELIRHQTIH 
YECSLCRKAFFEKSELIRHLRTH 
YECNECRKALRCHSFLIKHQRIH 
YECNECRKAFREKSSLINHQRIH 
YECSECRKAFRERSSLINHQRTH 
YECSECGKZ^GEKSSLATHQRTH 
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ZN84_HUMAN 


424 


YECSECGKAFSEKLSLTNHQRIH 


04333 9_HUMAN 


425 


YECSKCGKAPRGKYSLVQHQRVH 


Z157_HU]yiAN 


426 


YECSECGKIFSMKKSLCQHRRTH 


Z157_HUMAN 


427 


YECGECGKFFRMKMTLNNHQRTH 


Z157_HUMAN 


428 


YECGECGKNFRAKKSLNQHQRIH 


043361_HUMAN 


429 


YKCSECGKAFSLKHNWQHLKIH 


Z134_HUMAN 


430 


YECSECGKAFSRKATLVQHQRIH 


Z134_HU]yiAN 


431 ' 


YKCSECGKAFSRKDTLVQHQRIH 


Z134_HUMAN 


432 


YECSECGKTFSRKDNLTQHKRIH 


O14709_HUMAN 


433 


YKCKECGKVFIRSKSLLLHQRVH 


014709_HUMAN 


434 


YECDECGKCFILKKSLIGHQRIH 


O14709_HUMAN 


435 


YECNECGKVF ILKKSLILHQRFH 


O14709_HUMAN 


436 


YKCNKCQKAFILKKSLILHQRIH 


Z140_HUMAN 


437 


YACAECDKAF S RS FSLI LHQRTH 


Z140_HUMAN 


438 


YGCHECGKTFGRRFSLVLHQRTH 


095878_HUiyiAN 


439 


YACAQCGKTFNNTSNLRTHQRIH 


O14709_HUMAN 


440 


YKCDMCCKHFNKI SHLINHRRIH 


ZN83_HUMAN 


'441 


FKCDICGKIFNKKSNIiASHQRIH 


ZN07_HUiyiAN 


442 


HQCEDCEKIFRWRSHLIIHQRIH 


Z137_HUMAN 


443 


HKCDDCGKVLTSRSHLIRHQRIH 


Z140_HUMAN 


444 


HECKDCNKTFSYLSFLIEHQRTH 


Z189_HUMAN 


445 


HKCSDCGKAFSWKSHLIEHQRTH 


O75802_HUMAN 


446 


HKCSDCGKAFSWKSHLIEHQRTH 


O14709_HUMAN 


447 


YKCNDCGKVFSYRSNLIAHQRIH 


O43309_HUMAN 


448 


YGCDDCGKAFSQHSHLIEHQRIH 


075123_HUMAN 


449 


YTCDQCGKGFGQSSHLMEHQRIH 


043336_HUMAN 


450 


YNCTACEKAFI YKNKLVEHQRIH 


O43309_HUMAN 


451 


YKCDVCEKAFIQRTSLTEHQRIH 


O60792_HUMAN 


452 


YKCDQCGKGFI EGPSLTQHQRIH 


043309 HUMAN 


453 


YKCDKCGKAFTQRS VLTEHQ^II H 


ZN91 HUMAN 


454 


YKCEECGKAFKQLSTLTTHKRIH 


ZN91 HUMAN 


455 


YKCKECGKAFKQFSTLTTHKI IH 


ZN 9 INHUMAN 


456 


YKCKECDKTFKRLSTLTKHKI IH 


ZN9 INHUMAN 


457 


YKCKECDKTFKRLSTLTKHKIIH 


ZN85_HUMAN 


458 


YKCEKCGKAFNHFSHLTTHKI IH 


ZN85_HUMAN 


459 


YKCEECGKAFNRFSTLTTHKI IH 


ZN43 HUMAN 


460 


YKCEECGKAFNQFSTLTKHKI IH 


ZN43_HUMAN 


461 


YTCEECGKVFNWSSRLTTHKRIH 


ZN43 HUMAN 


462 


YKCEECGKAFNKSSILTTHKI IR 


07543 7_HUiyiAN 


463 


YKWEKFGKAFNRSSHLTTDKITH 


043345_HUMAN 


464 


YKCEEGGKAFNWSSTLTYYKSAH 


ZN9 INHUMAN 


465 


YKCEECGKAFNQSSNLTTHKI IH 


ZN9 INHUMAN 


467 


YKCEECGKAFNRSSKLTTHKI IH 


Q02313_HUMAN 


468 


YKCEECGKAFNQSSTLTTHNI IH 


ZN9 INHUMAN 


469 


YKCEECGKAFNHSSSLSTHKIIH 


ZN43 HUMAN 


470 


YKCBECGKAFKLSSTLSTHKI IH 


ZN91 HUMAN 


471 


YKCEECGKAFSQSSTLTTHKI IH 


Q02313 HUMAN 


472 


YKCEECGKAFNQSSTLTTHKRIH 
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O95780_HUMAN 


473 


YKCEECGKAFNS SSI LTEHKVI H 


095779_HUMAN 


474 


YKCEECGKAFNS SSI LTEHKVIH 


ZN9 INHUMAN 


475 


YKCKECGKAFKHS SALAKHKIIH 


ZN85_HUMAN 


476 


YKCKECGKAFKHSSTLTKHKI IH 


ZN85_HUMAJSI 


477 


YKCEECDKAFKWSSVLTKHKIIH 


ZN43__HUMAN 


478 


YKCEECGKAFKWSSTLTKHKI IH 


ZN85_HUMAN 


479 


YKCEECGKGFKWPSTLTIHKI IH 


ZN9 INHUMAN 


480 


YKCGECGKAFKESSALTKHKI IH 


ZN9 INHUMAN 


481 


YKCEECGKAFRKSSTLTEHKIIH 


ZN9 INHUMAN 


482 


YKCEECGKAFRQSSTLTKHKI IH 


Q02313_HUMAN 


483 


YKCGECGKAFNQS SALNTHKI IH 


ZN9 INHUMAN 


484 


CKCKECEKTFHWSSTLTNHKEIH 


07543 7_HUMAN 


485 


YKCKECGKTFNWSSTLTNHRKIY 


ZN9 INHUMAN 


486 


YKCKECGKAFSNSSTLANHKITH 


ZN9 INHUMAN 


487 


YKCKECGKAFSNSSTLANHKITH 


043345_HUMAN 


488 


YKCKECGKTFIKVSTLTTHKAIH 


043345_HUMAN 


489 


YKCEECGKTFSKVSTLTTHKAIH 


043345__HUMAN 


490 


YKCEECGKTFSKVSTLTTHKAIH 


043345_HUMAN 


491 


YKCEECGKAFSKVSTLTTHKAIH 


043345_HUMAN 


492 


YKCKECGKAFSKVSTLITHKAIH 


O95270_HUMAN 


493 


YACRMCGKAFKRSSTLSTHLLIH 


GF I INHUMAN 


494 


YDCKICGKSFKRSSTLSTHLLIH 


07534 S^HIMAN 


495 


YKCI ICGKAFKRSSTLTTHKKIH 


ZN43_HUMAN 


496 


YKCKECGKAFNQYSNLTTHNKIH 


ZN85_HUMAN 


497 


YKCKECGKAFNRSSTLTTHRKIH 


ZN9 INHUMAN 


498 


YKCSEECDKAFIWSSTLTEHKRIH 


ZN9 INHUMAN 


499 


YKCEECGKAFISSSTLNGHKRIH 


ZN43_HUMAN 


500 


YKCEECGKAFNYSSHLNTHKRIH 


O95780_HUMAN 


501 


YKCEECGKAFNWSSILTEHKRIH 


095779_HUMAN 


502 


YKCEECGKAFNWSSILTEHKRIH 


043345_HUMAN 


503 


YKCEECGKAFNWSSNLMEHKRIH 


043345_HUMAN 


504 


YKCEECGKAFNWSSNLMEHKRIH 


043345_HUMAN 


505 


YKCEECGKAFNWSSNLMEHKKIH 


043345_HUMAN 


506 


YKCEECGKAFNWSSNLMEHKKIH 


ZN9 INHUMAN 


507 


FKCKECGKAFIWSSTLTRHKRIH 


ZN9 INHUMAN 


508 


FKCKECGKGFIWSSTLTRHKRIH 


ZN9 INHUMAN 


509 


YKCEECGKAFLWSSTLRRHKRIH 


ZN91_HUMAN 


510 


YKCEECGKAFLWSSTLTPHKRIH 


Q02313_HUMAN 


511 


YKCEAYGRAFNWSSTLNKHKRIH 


ZN9 INHUMAN 


512 


YKFEECGKAFRQSLTLNKHKI IH 


Z14 INHUMAN 


513 


YKCEECGKAFRRSTDRSQHKKIH 


075346_HUMAN 


514 


YKCEECGKAFNWSSDLNKHKKIH 


ZN9 INHUMAN 


515 


YKCEECGKAFNWSSSLTKHKRIH 


ZN9 INHUMAN 


516 


YKCEECGKAFNWS S SLTKHKRFH 


ZN85_HU]yiZ^ 


517 


YKCEECGKAFNWSSTLTKHKRIH 


ZN43 HUMAN 


518 


YKCEECGKAFNWPSTLTKHNRIH 


ZN43__HUMAN 


519 


YKCEECGKAFNWPSTLTKHKRIH 


075437 HUMAN 


520 


YKCEECGKAFFWSSTLTKHKRIH 
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O95780_HUMAN 


521 


YKCEECGKAFNWCSSLTKHKRIH 


095.779_HUMAN 


522 


YKCEECGKAFNWC S S LTKHKRI H 


ZN43___HUMAN 


523 


YKCEECGKAFSRSSNLTKHKKIH 


ZN43_HUMAN 


524 


YKCTECGEAPSRSSNLTKHKKIH 


ZN9 INHUMAN 


525 


. YKCEECGKAFSRSSTLTKHKTIH 


075437_HUiyiAN 


526 


YKCEECGKAFNRSSTFTKHKVIH 


Z141_HUMAN 


527 


YKCEECGKAFNRFTTLTKHKRIH 


Z14 INHUMAN 


528 


YKCEECGKAFNRSTTLTKHKR I H 


ZN43__HUMAN 


529 


CKCEKCGKAFNCPS I I TKHKRIN 


043345_HUMAN 


530 


YKCEACGKAYNTFS ILTKHKVIH 


043345_HUMAN 


531 


YKCEECGKAFS TFS I LTKHKVI H 


043345_HUMAN 


532 


YKCEECGKSFSTFSILTKHKVIH 


043345_HUMAN 


533 


YKCEECGKS FSTFSVLTKHKVI H 


043345_HUMAN 


534 


YKCEECGKGFVMFSILAKHKVIH 


043345_HUMAN 


535 


YKCEECGKGFSMFSILTKHEVIH 


04334 5_HUMAN 


536 


YKCEECGKGFSMFS I LTKHEVI H 


043345_HUMAN 


537 


YKCKECGKAFSKFSILTKHKVIH 


043345_HUMAN 


538 


YKGKECGKAFSKFSILTKHKVIH 


043345_HUMAN 


539 


YKCKECGKAFSKFSILTKHKVIH 


043345_HUMAN 


540 


YRCKECGKAFSKFS I LTKHKVI H 


Z195_HUiyiAN 


541 


FKCEECDSIFKWFSDLTKHKRIH 


09578 0_HXJMAN 


542 


YKCEKCDKVFKRFSYLTKHKRIH 


095779_HUMAN 


543 


YKCEKCDKVFKRFSYLTKHKRIH 


09578 0_HUMAN 


544 


CICEECGKTFKWFSYLTKHKRIH 


095779_HUMAN 


545 


C I CEECGKTFKWFS YLTKHKRI H 


ZN43_HUMAN 


546 


YKCEECGKAF3SIHFSILTKHKRIH 


ZN9 INHUMAN 


547 


YKCEKCCKAFNQS S I LTNHKKI H 


Q02313_HUMAN 


548 


YKCEKCVRAFNQASKLTEHKLI H 


ZN85_HUMAN 


549 


YKSKBCEKAFNQSSKLTEHKKIH 


ZN43_HUMAN 


550 


YKCKECAKAFNQSSNLTEHKKIH 


ZN85_HUMAN 


551 


YKCEECGKAFNQSSKLTKHKKIH 


ZISrSS^HUMAN 


552 


YKCEECGKAFNQSSNLIKHKKIH 


043345_HUMAN 


553 


YKCEECGKAFNRSAILIKHKRIH 


043345_HUiyiAN 


554 


YKCEECGKAFNQSAILIKHKRIH 


043345__HUMAKr 


555 


YKCEECGICAFNQSAILTKHKI IH 


ZN43_HUMAN 


556 


YKCEVCGKAFNQFSNLTTHKRIH 


ZN43_HUMAN 


557 


• YTCEECGKAFNQFSNLTTHKRIH 


075346_HUMAN 


558 


YRCEECGKAFNQSANLTTHKRIH 


ZN85__HUM2\N 


559 


YTCEECGKAFNQSSNLTKHKRIH 


Z14 INHUMAN 


560 


YKCKDCDKAFKRFSHLNKHKKIH 


Z14 INHUMAN 


561 


YKCKECDKZVFKQPSLLSQHKKIH 


Q02313_HUMAN 


562 


YKCEECGKAFKQFSNLTDHKKIH 


ZN43_HUMAN 


563 


YKCEECGKAFTQSSNLTTHKKIH 


ZN43_HUMAN 


564 


YKCEECGKAFTQSSNLTTHKKIH 


ZN85_HUMAN 


565 


YKCEECGKAFKQS SNLTTHKI IH 


Q02313_HUMA]Sr 


566 


YKCEECGKAFNQLSNLTRHKVI H 


ZN85_HUMAN 


567 


YECEKCGKAFNQSSNLTRHKKSH 


095780 HUMAN 


568 


YNCEECGKAFNRCSHLTRHKKI H 
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095779_HUMAN 569 

O95780_HUMAN 570 

095779_HUMAN 571 

Q02313_HUiyiAN 572 

ZN91_HUMAN 573 

ZN91_HUiyiAN 574 

ZN43_HUMAN 575 

Z141_HUMAN 576 

Z141_HUiyiAN 577 

Z141_HUMAN 578 

Z141_HU]yiAN 579 

043345_HUMAN 580 

043345_HUMAN 581 

043345_HUMAN 582 

043345_HUMAN 583 

043345_HUMAN 584 

043345_HUMAN 585 

043345_HUMAN 586 

043345_HUMAN 587 

O95780_HUMAN 588 

095779_HUMAN 589 

Z195_HU]yiAN 590 

Z195_HUMAN 591 

043345_HUMAN 592 

043345_HUMAN 593 

.ZN43_HUMAN 594 

ZN43_HUMAN 595 

ZN9 INHUMAN 596 

ZN9 INHUMAN 597 

ZN91_HUMAN 598 

ZN9 INHUMAN 599 

ZN91_HUMAN 600 

ZN9 INHUMAN 601 

Z124_HUM7Uja 602 

Z141_HUMAN 603 

ZN74_HUMAN 604 

Z195_HUiyiAN 605 

Z195__HUiyiAN 606 

Z195_HUMAN 607 

ZN80_HUMAN 608 

Z165_HUMAN 609 

Q02313_HUMAN 610 

O60792_HUMAN 611 

ZN74_HUMAN 612 

Q15776_HUMAN 613 

O433 09_HUMAN 614 

O43309_HUMAN 615 

075123 HUMAN 616 



83 

YNCEECGKAFNRCSHLTRHKKIH 
YTCEDCGRAFNRHS HLTKHKTI H 
YTCEDCGRAFNRHSHLTKHKTIH 
YECEECGKAFNRS SKLTEHKYI H 
YKCEECGKAFNRSSNLTIHKFIH 
-YKCEECGKAFNRSSNLTIHKFIH 
YKCEKCGKAFNRPSNLIEHKKIH 
YTCEECRKIFTSSSNFAKHKRIH 
FTCEECGSIFTTSSHFAKHKI IH 
YTCEECGKAFKWSLIFNEHKRIH 
YTCEECGKAFRQSSKLNEHKKVH 
YKCEECGKAYKWSSTLSYHKKIH 
YKCEECGKAYKWSSTLSYHKKIH 
YKCEECGKAYKWPSTLSYHKKIH 
YKCEECGKAYKWPSTLSYHKKIH 
YKCEECGKAYKWPSTLRYHKKIH 
YKCEECGKGFSWSSTLSYHKKIH 
YKCEECGKAFSWLSVFSKHKKIH 
YKCEECGKAFSWLSVFSKHKKTH 
YKCEECGKAFHWCSPFVRHKKIH 
YKCEECGKAFHWCS PFVRHKKI H 
YTCEECGNIFKQLSDLTKHKKTH 

YKCEECGRAFMWFSDI TKHKQTH 
YKCEECGKAFSWPSRLTEHKATH 
YKCEECDKAFSWPSSLTEHKATH 
YKCEECGKAFKWSSKLTEHKI TH 
YKCEECGKAFKWSSKLTBHKLTH 
YKCEECGKAF S HS S ALAKHKR I H 
YKCEECGKAFSHSSALAKHKRIH 
YKCEECGKAFSHSSTLAKHKRIH 
YKCEECGKAFSQPSHLTTHKRMH 
YKCEECGKAFSQSSTLTRHKRLH 
YKCEECGKAFSQS STLTRHTRMH 
YECMECGKALGFSRSLNRHKRIH 
YKCDECGKAFGRSRVLNEHKKI H 
YKCDECGKAFTWSTNLLEHRRIH 
YKCDECGKAYTQSSHLSEHRRIH 
YKCDECGKNFTQSSNLIVHKRIH 
YKCDECGKNFTQSSNLIVHKRIH 
YKCKECGSVFNKNSLLVRHQQIH 
FGCKECGRAFNLNSHLIRHQRIH 
YKCKECGKAFNQTSHLIRHKRIH 
YKCNECGRAFNQNIHLTQHKRIH 
YRCGECGKAFNQRTHLTRHHRIH 
YKCKECGKAFNGNTGLIQHLRIH 
YKCDECGNAFRGITSLIQHQRIH 
YKCEECGKAFRGRTVLIRHKI IH 
YVCNECGKRFSQTSNFTQHQRIH 
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O60792_HUMAN 617 

043296__HUMAN 618 

043337_HUMAN 619 

043296_HUMAN 620 

OZF_HUMAN 621 

ZN83_HUMAN 622 

ZN07_HUMAN 623 

ZN83_HUMAN 624 

ZN83_HUMAN 625 

ZN83_HUMAN 626 

ZN83_HUMAN 627 

ZN83_HUMAN 628 

ZN83_HUMAN 629 

ZN83_HUMAN 630 

ZN83_HUMAN 631 

Z189_HUMAN 632 

O75802_HUMAN 633 

ZN83_HUMAN 634 

O60792_HUMAN . 635 

043361_HUMAN 636 

ZN83_HUr^A]SI 637 

O60792_HUMAN 638 

Z137_HUMAN 639 

O14709_HUiyiAN 640 

Z124_HUMAN 641 

O60792_HUMAN 642 

O60792_HUMAN 643 

ZN83_HUiyiAN 644 

ZN83_HUMAN 645 

Z1322HUiyiAN 646 

043339_HUMAN 647 

043338_HUMAN 648 

ZN45_HUMAN 649 

Z]SF45_HUMAN 650 

Z263_HUiyiAN 651 

Z202_HUMAN 652 

O75850_HUMAN 653 

Z205_HUMAN 654 

015535_HUMAN 655 

ZN24_HUMAN 656 

Z191___HUMAN 657 

Q99592_HUMAN 658 

Q13397_HUMAN 659 

Z189_HUMAN 660 

O75802_HUMAN 661 

Z189_HUiyiAN 662 

O75802_HUMAN 663 

Z263 HUMAN 664 



84 

YKCNECGKAFNGPSTFIRHHMIH 
FVCSECGKAFTHCSTFILHKRAH 
YECSQCRKAFTHRSTFIRHNRTH 
YKCNECGKAFTHRSNFVLHlSrRRH 
YGCNECGKAFSQFSTLALHLRIH 
YKCNERGKAFHQGLHLPIHQI IH 
YKCNECGKAFSQNSTLFQHQI IH 
YKCNECGKVFSRNSYLAQHLI IH 
YECNKCGKVF SRNS YLVQHL 1 1 H 
YKCNECGKVFGLNSSLAHHRKIH 
YKCNECGKVFHQI SHLAQHRTIH 
YKCNECGKVFHNMSHLAQHRRI H 
YKCNECGKVFNQI SHLAQHQRIH 
YRCNVCGKVFHHISHLAQHQRIH 
YKCDECGKVFSQNSYLAYHWRIH 
YKCDECGKTFSVSAHLVQHQRIH 
YKCDECGKTFSVSAHLVQHQRIH 
YKCDECDKAFSQNSHLVQHHRIH 
YKCDECGKAFSQRTHLVQHQRIH 
YECGESSKVFKYNSSLIKHQI IH 
FKCNECGKAF SMRS SLTNHHA: H 
YKCNECGKAFSYCSSLTQHRRIH 
YKYHDCGKVFSQASSYAKHRRIH 
YKCEDCGKAF S YNS SLLVHRRI H 
YVCMECGKAFSCLSSLQGHIKAH 
YQCHECGKTFSYGSSLIQHRKIH 
YDCAECGKSFSYWSSLAQHLKIH 
YKCNECGKVFSHKSSLVNHWRIH 
YKCNECGKVFSHKSSLVNHWRIH 
YKCSECGKFFSRKSSLICHWRVH 
YKCNECGKFFSQTSHIiNDHRRIH 
YECSECGKSFSQTSHLNDHRRIH 
YKCNACGKSFSYSSHLNIHCRIH 
YKCGTCGKGFSRSSDIiNVHCRIH 
YKCPLCGKNFSNNSNLIRHQRIH 
YTCPTCGKSFSRGYHLIRHQRTH 
FSCPQCGKSFSRKTHLVRHQLIH 
YACPLCGKSFSRRSNLHRHEKIH 
HQCIECGKSFNRHCNLIRHQKIH 
YECVQCGKSYSQSSNLFRHQRRH 
YECVQCGKS YS QS SNLFRHQRRH 
YTCTQCGKSFQYSHNLSRHAWH 
YTCTQCGKS FQYSHNLSRHAWH 
YLCRQCGKSFSQLCNLIRHQGVH 
YLCRQCGKSFSQLCNLIRHQGVH 
YQCKECGKSFSQLCNLTRHQRIH 
YQCKECGKSFSQLCNLTRHQRIH 
YKCTLCGENFSHRSNLIRHQRIH 
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Z263_HUMAN 


665 


YKCPECGEIFAHSSNLLRHQRIH 


095878_HUMAN 


666 


YKCSECGKSFSRSSNRIRHERIH 


Z263_HUMAN 


667 


YTCHECGDSFSHSSNRIRHLRTH 


043336_HUMAN 


668 


YVCI ICGKSFIRSSDYMRHQRIH 


043336_HUMAN 


669 


YVCMECGKSFIHSYDRIRHQRVH 


BCL6_HUMAN 


- 670 


YRCNI CGAQFNRPANLKTHTRIH 


Z133_HUiyiAN 


671 


YKCGECGLSPSKMTNLLSHQRIH 


ZN75_HUMAN 


672 


YRCSWCGKSFSHNTNLHTHQRIH 


O60893_HUMAN 


673 


YKCNECERSFTRNRSLIEHQKIH 


ZN74_HUMAN 


674 


YKCSECGRAFSQNHCLIKHQKIH 


O14709_HUMAN 


675 


YACSECGKGFTYNRNLIEHQRIH 


Z177_HUiyiAN 


676 


YKCFQCEKAFSTSTNLIMHKRIH 


O60792_HUMAN 


677 


YKCNECEKAFSRSENLINHQRIH 


094892_HUMAN 


678 


YGCTLCAKVFSRKSRLNEHQRIH 


Z189_HUMAN 


679 


YHCTKCKKSFSRNSLLVEHQRIH 


O75802_HUMAN 


680 


YHCTKCKKSFSRNSLLVEHQRIH 


O43309_HUMAN 


681 


YQCTQCNKSFSRRSILTQHQGVH 


015535_HUM7^ 


682 


YQCSQCSKSYSRRSFLIEHQRSH 


Z2 05_HUiyiAN 


683 


YTCPACRKSFSHHSTLIQHQRIH 


Z189_HUMAN 


684 


YTCIECGKSFSRSSFLIEHQRIH 


O75802_IIUMAN 


685 


YTCIECGKSFSRSSFLIEHQRIH 


Z189_HUMAN 


686 


FQCNECGKS FS RS S FVI EHQRI H 


O75802_HUMAN 


687 


FQCNECGKSFSRSSFVIEHQRIH 


Z189_HUMAN 


688 


YLCTVCGKSFSRSSFLIEHQRIH 


O75802_HUMAN 


689 


YLCTVCGKSFSRSSFLIEHQRIH 


O14709_HUMAN 


690 


YECHVCRKVLTSSKMLMVH.QRIH 


O14709_HUMAN 


691 


YECDKCRKSFTSKRNLVGHQRIH 


ZN35_HUiyiAN 


692 


YECNECGKTFTRSSNLIVHQRIH 


075123_HUMAN 


693 


YECNECGKSFIRSSSLIRHYQIH 


043296_HUMAN 


694 


YECVECGKSFCWSTNLIRHAI IH 


043296_HUMAN 


695 


YECSECGKVFLESAALIHHYVIH 


043337_HUMAN 


696 


YECTQCGKAFHRSTYLIQHSVIH 


043296_HUMAN 


697 


YECTECGKTFIKSTHLLQHHMIH 


075290_HUMAN 


698 


YECKECGKYFSRSANLIQHQSIH 


Z205_HUMAN 


699 


YACTDCGKRFGRSSHLIQHQI IH 


Z1S5_HUMAN 


700 


YECSECGKTFRVSSHLIRHFRIH 


Q15776_HUMAN 


701 


YECDECGKTFRRSSHLIGHQRSH 


Q15776_HUMAN 


702 


YECNECGKAFSHSSHLIGHQRIH 


Z189_HUMAN 


703 


YECNYCGKTFSVSSTLIRHQRIH 


O75802_HUMAN 


704 


YECNYCGKTFSVSSTLIRHQRIH 


043337_HUMAN 


705 


YECNACGKAFSQSSTLIRHYLIH 


ZN07 HUMAN 


706 


YECSECGKAFSRSSYLIEHQRIH 


Z132_HUMAN 


707 


YECSECGKAFAHSSTLIEHWRVH 


O43340_HUiyi7^ 


708 


YECSECGKAFSCNIYLIHHQRFH 


Z135_HUMAN 


709 


YECGECGKAFSQSTLLTEHRRIH 


04333 8_HU]yiAN 


710 


YECGECGKSFSQSSNLIEHCRIH 


043338_HUMAN 


711 


YECGKCGKSPTQHSGLILHRKSH 


Z140 HUMAN 


712 


YECDECGKVFTWHASLIQHTKSH 
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Q133 98_HUiyiAN 713 

Q13398_HUMAN 714 

O43340_HUMAN 715 

O43340_HUMAN 716 

04334 0_HUMAN 717 

O43340_HUMAN 718 

O43340__HUMAN 719 

O43340_HUMAN 720 

Q13398_HUMAN 721 

Q13398_HUMAN 722 

Z132_HUMAN 723 

Z132_HUMAN 724 

Q13398_HUMAN 725 

043339_HUMAN 726 

043339_HUMAN 727 

O60765_HUMAN 728 

043338_HUMAN 729 

043339_HUMAlSr 730 

04333 8_HUMAN 731 

Q1339B_HUMAN 732 

O43340__HUMAN 733 

O43340_HUMAN 734 

Q13398_HUMAN 735 

043339_HUMAN 736 

043338_HUMAN 737 

043339_HUMAN 738 

Q133 98_HUMAN 739 

O43340_HUMAN 740 

O43340_HUMAN 741 

O43340_HUMAN 742 

Z189_HUMAN 743 

O75802_HUMAN 744 

O43340_HUiy[AN 745 

O433 09_HUMAN 746 

Q15776_HUMAN 747 

015535_HUMAN 748 

O60792_HUMAN 749 

Q15776_HUMAN 750 

ZN84_HUMAN 751 

Q15776_HUMAN 752 

Z189_HUMAN 753 

O75802_HUMAN 754 

Z189_HUMAN 755 

O75802_HUMAN 756 

ZN24_HUiy[AN 757 

Zl 9 INHUMAN 758 

OZP^HUMAN 759 

Q15776 HUMAN 760 



YACPECGKSFSQIYSLNSHRKVH 

YECSKCGKSFKQSSSFSSHRKVH 

YECSECGKSFSHSTNLFRHWRVH 

YECSECGKSFSHSTNIiYRHRSAH 

YECSECGKSFSQSSGLLRHRRVH 

YKCSECGKSFSQSSGFLRHRKAH 

YECSECGKVFSQSSGLFRHRRAH 

YECDECGKS YS QS SALLQHRRVH 

YECSECGKSFSQSSSLIQHRRVH 

YECGECGKSFSQRSNLMQHRRVH 

YECSECRKSFSRSSSLIQHWRIH 

YECSQCGKSFSRSSALIQHWRVH 

HECNECGKSESRSSSLIHHRRLH 

YKCGECGNSFSQSAILNQHRRIH 

YKCGDCGKSFSQSSILIQHRRIH 

YRCEECGISFGQSSALIQHRRIH 

YECGQCGKSFSLKCGLIQHQLIH 

YECGQCGKSFSQKSGLIQHQWH 

YDCGQCGKSFIQKSSLIQHQWH 

YQCSQCGKSFGCKSVLIQHQRVH 

YVCSECGKSFGQKSVLIQHQRVH 

YDCSECGKSFRQVSVLIQHQRVH 

YECSECSKSFSCKSNLIKHLRVH 

YECGQCGKSFSQKATLIKHQRVH 

YVCGQCGKS FSQRATLIKHHRVH 

YECSQCGKSFSQKATLVKHQRVH 

YECSECGKSFSQIsTFSLIYHQRVH 

YECSVCGKSFIRKTHLIRHQTVH 

YECSECEKSFSCKTDLIRHQTVH 

YECRECGKSFTRKNHLIQHKTVH 

HKCEECGKGFVRKAHFIQHQRVH 

HKCEECGKGFVRKAHFIQHQRVH 

HECSECGKSFSRKTHLTQHQRVH 

YQCKECGKSFSQSGIilQHQRIH 

YQCNQCGKAPSQSAGLILHQRIH 

YHCKECGKVFSQSAGLIQHQRIH 

YNCNECRKTFSQSTYLIQHQRIH 

YHCKECGKAFSQNTGLILHQRIH 

YGCNECGRAFSEKSNLINHQRIH 

YKCNECGRAFSQKSGLIEHQRIH 

HKCDECGKAFSRNSGLIQHQRIH 

HKCDECGKAFSRNSGLIQHQRIH 

HKCEECGKAFSRSSGLIQHQRIH 

HKCEECGKAFSRSSGLIQHQRIH 

YKCLECGKAPSQNSGLINHQRIH 

YKCLECGKAFSQlSrSGLINHQRIH 

YQCSECGKAFSQKSHHIRHQKIH 

YQClSrECGKAFIQRSSLIRHQRIH 
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ZN35_HUMAN 


761 


YDCSECGKAFSQLSSLIVHQRIH 


ZN07_HUMAN 


762 


YRCEECGKAFGQS S SL IHHQRI H 


O60765_HUMAN 


763 


FKCNTCGKTFRQSSSRIAHQRIH 


OZF_HUMAN 


764 


FKCSECGTAFGQKKYLI KHQNIH 


OZF_HUiyiAN 


765 


FECNECGKAFSQKQYVIKHQNTH 


Q92951_HUMAN 


766 


FECTHCGKSFRAKGNLVTHQRI H 


OZF_HUMAN 


767 


FECNECGKSFSQKENLLTHQKIH 


ZN74_HUMAN 


768 


FKCNECGKAFSSHAYLIVHRRIH 


ZN74_HUMAN 


769 


FKCADCGKGFSCHAYLLVHRRIH 


O60765_HUMAN 


770 


FKCSECGRAFSQSASLIQHERIH 


ZN35_HUMAN 


771 


FECHECGKAFIQSANLWHQRIH 


ZN35_HUMAN 


772 


FTCSVCGKGFSQSANLWHQRIH 


ZN35_HUiyiAN 


773 


FACNDCGKAFTQSANLIVHQRSH 


O14709_HUiyiAN 


774 


YKCNECGKDFSQNKNLWHQRMH 


O14709_HUMAN 


775 


YKCDECGKTFAQTTYL IDHQRLH 


O14709_HUMAN 


776 


YKCNECGKVFSQNAYLIDHQRLH 


014709_HUMAN 


777 


YKCTECGKAFTQSAYLFDHQRLH 


O14709_HUMAN 


778 


YKCNECGKAFSQSAYLLNHQRIH 


Z157_HUiyiAN 


779 


YQCNECGKSFRVHSSLGIHQRIH 


O60765_HUMAN 


780 


YNCNECGECALSSHSTLIIHERIH 


EVI INHUMAN 


781 


YKCDQCPKAFNWKSNLIRHQMSH 


Q15776_JIUMA]SI 


782 


YQCNVCGKAFSYRSALLSHQDIH 


O43309_HUMAN 


783 


YECNECGKAFVYNS SLVSHQEIH 


Z200_HUMAN 


784 


YGCKKCGRRFGRLSNCTRHEKTH 


015361 HUMAN 


785 


YGCKKCGRRFGRLSNCTRHEKTH 


ZN07_HUMAN 


786 


YKCaSTDCGKAFNRSSRLTQHQKIH 


ZN74_HUMAN 


787 


YQCGS CGKAFTCHS SLTVHEKI H 


ZN35_HUMAN 


788 


YVCSKCGKAFTQSSNLTVHQKIH 


Z140_HUMAN 


789 


YECIECGKAFRRFSHLTRHQSIH 


060893 HUMAN 


790 


YQCNMCGKAFRRNSHLLRHQRIH 


Q13396 HUMAN 


791 


YSCTECEKSFVQKQHLLQHQKIH 


043 3 6 INHUMAN 


792 


YECTQCAKAFVRKSHLVQHEKIH 


043361_HUMAN 


793 


YECTECEKAFVRKSHLVQHQKIH 


075123_HUMAN 


794 


YECKECGKAFLQKAHLTEHQKIH 


075290 HUMAN 


795 


YECKECGKGFNRGAHLIQHQKIH 


075290 HUMAN 


796 


YECKECGKGFNRGAHLIQHQKIH 


075290 HUMAN 


797 


FECKECGKAFRLHMQLIRHQKLH 


O75290~HUMAN 


798 


FECKECGKAFRLHMHLIRHQKLH 


075290 HUMAN 


799 


FECKECGKAFRLHI QPTRHQKFH 


O75290_HUMAN 


800 


YECKECGKAFRLYLQLSQHQKTH 


Z140_HUMAN 


801 


YECTECGKAFSRASNLTRHQRIH 


043296_HUMAN 


802 


YECVECGKAFTRMSGLTRHKRIH 


043296_HUMAN 


803 


YECMECGKAFNRKS YLTQHQRI H 


014913_HUMAN 


804 


HECVECGKRFSSSSRLQEHQKIH 


EVI1_HUMAN 


805 


HACPECGKTFATSSGLKQHKHIH 


015535_HUMAN 


806 


• YECNECGKAFSRSSGLFNHRGIH 


Z132_HUM7^ 


807 


YECNDCGKAFSNSSTLIQHQKVH 


Z132_HUMAN 


808 


YECIQCGKAFSERSTLVRHQKVH 
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Z132_HUMAN 809 

Z124_HUMAN 810 

ZN35_HUM7ysr 811 

O60792_HUMAN 812 

075467_HUMAN 813- 

-075467__HUMAN 814 

OZF_HUMAN 815 

Z132_HUMAN 816 

O60765_HUMAN 817 

O60792_HUMAN 818 

043336_HUMAN 819 

04333 6_HUMAN 82 0 

095878_HUMAN 821 

O60792_HU]yiAN 822 

O433 09_HUiyiAN 823 

043336_HUMAN 824 

O60893_HUMAN 825 

043338_HUMAN 826 

075123_HUMAN 827 

O60792_HUMAN 828 

ZN42_HUMAN 82 9 

O14709_HUMAN 830 

0433 6 INHUMAN 831 

043361_HUiyiAN 832 

043361_HUMAN 833 

Z157_HUMAN 834 

075123_HUMAN 835 

Q13398_HUMAN 836 

043361_HUMAN 837 

043361_HIJMAN 838 

Z132_HUMAN 83 9 

Q133 96_HUMA]SI 840 

075467_HUMAN 841 

Z165_HUMAN 842 

Z205_HUiy[AN 843 

Z135__HUMAN 844 

Z135_HU1VIAN 845 

Z135_HUiyiAN 846 

ZN74_HUMAN 847 

ZN74_HUMAN 848 

ZN35_HUMAN 849 

O43309_HUMAN 850 

ZN24_HUMAN 851 

Z191_HUiyiAN 852 

043296_HUMAN 853 

ZN75_HUMAN 854 

O75290_HUMAN 855 

075467 HUMAN 856 
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YECDECGKAFSNRSHLIRHEKVH 
YECQKCGKAFSRASTLWKHKKTH 
FKCNECEKAFSYSSQLARHQKVH 
FECSECGKAFSYLSNLNQHQKTH 
FRCSECGKAFSHGSNLSQHRKIH 
FACPQCGRAFSHSSNLTQHQLLH 
FACKVCGKVFSHKSNLTEHEHFH 
YECSQCGKLFSHLCNLAQHKKIH 
YECNTCGKLFNHRSSLTNHYKIH 
YECAECGKAFRHCSSLAQHQKTH 
CECSECGKCFRHRTSLIQHQKVH 
CECNECGKVFSHQKRLLEHQKVH 
YBCTECGRTFSDISNFGAHQRTH 
YECNECGKAFSQHSNLTQHQKTH 
YHCNDCGKAFSQKAGLFHHIKIH 
.YECSDCGKAF I SKQTLLKHHKIH 
YECDDCGKTFSQSCSLLEHHKIH 
FBCDECGKSFSQRTTLNKHHKVH 
YVCSYCGKGFIQRSNFLQHQKIH 
YTCNECGKAFSQRGHFMEHQKIH 
YTCDVCGKVFSQRSNLLRHQKIH 
YGCNDCSKVFRQRKNLTVHQKIH 
YVCSECGKAFLTQAHLDGHQKIQ 
YtCSECGKAFLTQAHLVGHQKIH 
YECTQCGKAFLTQAHLVGHQKTH 
YECGECAKTFSARSYLIAHQKTH 
YECNECGKAFFLSSYLIRHQKIH 
YECNECGKFFTYYSSFI IHQRVH 
YKCSKCGKFFRYRCTLSRHQKVH 
YECNKCGKFFMYNSKLIRHQKVH 
YECNECGKFFSQNSIIjIKHQKVH 
YECGYCGKSFSHPSDLVRHQRIH 
YACPVCGKAFRHSSSLVRHQRIH 
HQCNECGKAFRHSSKLARHQRIH 
YHCLDCGKSFSHSSHLTAHQRTH 
YACRDCGKAFTHS SSLTKHQRTH 
YECNDCGKAFSHSSSLTKHQRIH 
YQCGECGKAFSHSSSLTKHQRIH 
FDCSQCWKAFSCHSSLIMHQRIH 
YTCGECGKAFSCHSSIiNVHQRIH 
YECKECGKAFSCFSHLIVHQRIH 
YKCNECGKAPGRWSALNQHQRLH 
YGC VECGKAP SRS S I LVQHQRVH 
YGCVECGKAFSRSSILVQHQRVH 
YKCSECGKAFSRSSSLTQHQRMH 
FKCQECGKSFRVSSDLIKHHRIH 
FVCKECGMAFRYHYQLIEHCQIH 
FVCTQCGRAFRERPALFHHQRIH 
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2N74_HUMAN 


857 


FKCEKCGEMFNWSSHLTEHQRLH 


ZN85_HUMAN 


858 


FKCTKCGKSFGMISCLTEHSRIH 


ZN43_HUMAN 


859 


FKCKECGKSFCMLPHLAQHKI IH 


Z195_HUMAN 


860 


FKCQECGKSFQMLSFLTEHQKIH 


ZN07_HUMAN 


861 


FKCDECGKAFRWISRLSQHQLIH 


Z189_HUMAN 


862 - 


HKCGECGKAFRLSTYLIQHQKIH 


O75802_HUMAN 


863 


HKCGECGKAFRLSTYLIQHQKIH 


ZNOV^HIMAN 


864 


FKCTECGKAFRLSSKLIQHQRIH 


O75290_HUMAN 


865 


FECKECGKAFTLLTKLVRHQKIH 


O75290_HUMAN 


866 


FECKECGKVFSLPTQLNRHKNIH 


O75290_HUMAN 


867 


FECRECGKAFSLLNQLNRHKNIH 


O75290_HUMAN 


868 


FECKECEKAFSNRAHLIQHYIIH 


043296_HUMAN 


869 


FECKECGKAFSNRKDLIRHFS IH 


06242 5_CAEEL 


870 


FVCKYCGKAFRQASTLCRHKI IH 


075123_HUMAN 


871 


FE CKDCGKAF I QS S KLLLHQ IIH 


O752 90_HUMAN 


872 


FECKECGKFFRRGSNLNQHRS IH 


075290_HUMAN 


873 


FECKECGKSFNRSSNLVQHQSIH 


O75290_HUMAN 


874 


FECKECGKSFNRSSNIjVQHQSIH 


O75290_HUMAN 


875 


FECQDCGKAFNRGS SLVQHQS IH 


094892_HUMAN 


876 


PVCSECRKAFSSKRNLIVHQRTH 


O14709_HUiyiAN 


877 


FECSECGRAFSSNRNLIEHKRIH 


Z135_HUMAN 


878 


YECNQCGRASARATLLIEHQRI H 


Z157_HUMAN 


879 


FECQECGKAFCRKAHLTEHQRTH 


Z157_HUMAN 


880 


FECNECGKAYCRKSNLVEHLRIH 


075123_HUMAN 


881 


FECNECGKAFIRSSKLIQHQRIH 


ZN42_HUMAN 


882 


FRCAECGQSFRQRSNLLQHQRIH 


ZN42_HUMAN 


883 


FACPECGQSFRQHANLTQHRRIH 


ZN42 HUMAN 


884 


FACAECGQSFRQRSNIiTQHRRIH 


ZN42_HUMAN 


885 


- -CAECGKAFRQRPTLTQHLRVH 


ZN42_HUMAN 


886 


YACPECGKAFRQRPTLTQHLRTH 


014913_HUMAN 


887 


YKCEECGNSFYYPAMLKQHQRI h 


Z174_HUMAN 


888 


YTCGECGNCFGRQSTLKLHQRIH 


PLZF HUMAN 


889 


YECEFCGSCFRDESTLKSHKRIH 


BCL6_HUMAN 


890 


YPCEICGTRFRHLQTLKSHLRIH 


043296_HUMAN 


891 


FECLECGKAFNHRSYLKRHQRIH 


043337_HUMAN 


892 


YKCLECGKAFKRRSYLMQHHPIH 


043296_HUMAN 


893 


YECLECGKVFKHRSYLMWHQQTH 


'075123 HUMAN 


894 


YECKECGKAFRHRSDLIEHQRIH 


043336 HUMAN 


895 


YECKECGKAFIHKKRLLEHQRIH 


Z157_HUMAN 


896 


YECSECGNAFYVKVRL I EHQRIH 


Z157_HUMAN 


897 


YECNECGNAFYVKARLIEHQRMH 


OZF_HUMAN 


898 


FVCKECGKTFSGKSNLTEHEKIH 


Z134_HUMAN 


899 


YKCSDCGKVFRHKSTLVQHESIH 


O60893_HUMAN 


900 


YECEDCGKTFIGSSALVIHQRVH 


043339_HUMAN 


901 


YECSECGKLFRQNSSLVDHQKIH 


043338_HUMAN 


902 


FECSECGKFFRQSYTLVEHQKIH 


04333B_HUMAN 


903 


YECGECGKLFRQSFSLWHQRIH 


043361 HUMAN 


904 


YECSECGKLFMDSFTLGRHQRVH 
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043361_HUMAN 


905 


YECSECGKFFRDSYKLl IHQRVH 


043361_HUMAN 


906 


YECNECGKFFLDSYKLVIHQRIH 


043336_HUMAN 


907 


YECSECGKGFYLEVKLLQHQRIH 


ZN07_HUMAN 


908 


YECAECGKVFRLCSQLNQHQRIH 


Z132_HUMAN 


909 


HVCKECGKAFSHSSKLRKHQKFH 


TYY1_HUMAN 


910 


HVCAECGKAFVES SKLKRHQLVH 


0153 91_HUMAN 


911 


HVCAECGKAFLESSKLRRHQLVH 


094892_HUMAN 


912 


HVCSECGKAFVKKSQLTDHERVH 


ZFX__HUMAN 


913 


HI CVECGKGFRHPSELKKHMRIH 


ZFY_HUMAN 


914 


HICVECGKGFRYPSELRKHMRIH 


Q15558_HUMAN 


915 


HICVECGKGFRHPSELRKHMRIH 


Z135_HU]yiAN 


916 


YECHECLKGFRNSSALTKHQRIH 


ZN74__HUiyiAN 


917 


YTCGECGKAFRQSSSLTLHRRI'm 


Z174_HUMAN 


918 


YQCGQCGKSFRQSSNLHQHHRLH 


Z195_HUiyiAN 


919 


YQCEECGKVFRTCSSLSNHKRTH 


HKR3_HUMAN 


920 


FQCHLCGKTFRTQASLDKHNRTH 


04333 7_HUiyU\N 


921 


YDCMACGKAFRCSSELIQHQRIH 


O60765_HUiy[A2Nr 


922 


YLCNECGNTFKSSSSLRYHQRIH 


O60765_HUMAN 


923 


YKCNECGKTFRCNSSLSNHQRIH 


Z140_HUMAN 


924 


YKCNECGKAFSSGSELIRHQITH 


Q14585_HUMAN 


925 


YECKECGKAFSFGSGLIRHQI IH 


Q14585_HUMA]sr 


926 


YICNECGKAFSFGSALTRHQRIH 


Q14585_HUMAN 


927 


YECKECGKSFSSGSALNRHQRIH 


Q14585_HUMAN 


928 


YECKACGMAFSSGSALTRHQRIH 


Q14585_HUMAN 


929 


YECKECGKSFSFESALIRHHRIH 


Q14585_HUMAN 


. 930 


YECKECGKTFSSGSDLTQHHRIH 


Q14585_HUMAN 


931 


YVCKECGKAFNSGSDLTQHQRIH 


Q14585_HUMAN 


932 


YECKECGKAFYSGSSLTQHQRIH 


Q14585__HUMAN 


933 


FECKECGKAFGSGSNLTHHQRIH 


Qi4585_HUMAN 


934 


YECKECGKAFGSGANLAYHQRIH 


Q14585_HUMAN 


935 


YECIDCGKAFGSGSNLTQHRRIH 


Q14585_HUMAN 


936 


YECKECGKAFGSGSKLIQHQLIH 


Q14585_HUMAN 


937 


YECKECEKAFRSGSKLIQHQRMH 


ZN80_HUMAN 


938 


YECKECGKTFYYNSSLTRHMKIH 


ZN80_HUMAN 


939 


YECKECGKGFYYSYSLTRHTRSH 


Z165_HUMAN 


940 


YECNECGKSFAESSDLTRHRRIH 


Z202_HUMAN 


941 


YKCTI CGKSFSQKSVLTTHQRIH 


043167_HUMAN 


942 


YTCE I CGKSFTAKS SLQTHIRIH 


Q92618_HUMAN 


943 


HTCC I CGKSFPFQS SL S QHMRKH 


Q15776_HUMAN 


944 


HKCDECGKSFAQSSGLVRHWRIH 


015535_HUMAN 


945 


HKCDECGKSFTQSSGLIRHQRIH 


O60893_HUMAN 


946 


HYCHECGKSFAQSSGLTKHRRIH 


ZN24jmjMAN 


947 


HI CDECGKHFSQGSALI LHQRIH 


Z191_HUMAN 


948 


HICDECGKHFSQGSALILHQRIH 


Z14 0_HUMAN 


949 


YACKECGKTFSQISNIiVKHQMIH 


Q14585_HUMAN 


950 


YECKECGKDFSFVSVLVRHQRIH 


075123_HUMAN 


951 


FECKECGKGFSQSSLLIRHQRIH 


UKLF HUMAN 


952 


FKCiraCDRCFSRSDHLALHMKRH 
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O95600_HUiyiAN 953 

SP2_HUMAN 954 

SP4_HUMAN 955 

O60402_HUMAN 956 

075411_HUMAN 957 

Q13118_HUMAN 958 

014 90 INHUMAN 959 

BTE1_HUMAN . 960 

SP2_HU]yiAN 961 

SP4_HUMAN 962 

O604 02_HUMAN 963 

EZF_HUMAN 964 

O95600_HUMAN 965 

UKLF_HUMAN 966 

EKLF_HUMAN 967 

BTE2_HUMAN 968 

O14901__HUMAN 969 

Q13118_HUMAN 970 

075411_HUMAN 971 

BTE1_HUMAN 972 

EGR4_HUMAN 973 

EGR2_HUMAN 974 

EGR1_HUMAN 975 

EGR3__HXJMAN 976 

Q16256_HUMAN 977 

WT1_HUMAN 978 

Q15881_HUMAN 979 

Q15 8 8 INHUMAN 980 

Q16256_HUMAN 981 

WT1_HUMAN 982 

EGR4_HUMAN 983 

EGR3_HUMAN 984 

EGR2_HUMAN 985 

EGR1_HUMAN 986 

EVIl_HUiyiAN 987 

095878_HUMAN 988 

Z140_HUMAN 989 

O60893_HUMAN 990 

Z135_HUMAN 991 

095878_HUMAN 992 

ZN80_HUMAN 993 

ZN80_HUiyiAN 994 

Z135_HUMAN 995 

Z135_HUMAN 996 

Z2 63_HUiyiAN 997 

Z263_HUiyiAN 998 

Z202_HUMAN 999 

ZN74 HUMAN 1000 
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FRCTDCNRSFSRSDHLSLHRRRH 

YACAQCQKRFiVlRSDHLTKHyKTH 

YACPECSKRFMRSDHLSKHVKTH 

YACPECSKRFMRSDHLSKHVKTH 

YACPMCDRRFMRSDHLTKHARRH 

YACPMCDRRFMRSDHLTKHARRH 

YACPVCDRRFMRSDHLTKHARRH 

YACPLCEKRFMRSDHLTKHARRH 

FVCNWFFCGKRFTRSDELQRHARTH 

FICNWMPCGKRFTRSDELQRHRRTH 

FICNVinyiFCGKRFTRSDELQRHRRTH 

YHCDWDGCGWKFARSDELTRHYRKH 

YKCTWDGCSWKFARSDELTRHFRKH 

YKCSWEGCEWRFARSDELTRHYRKH 

YACTWEGCGWRFARSDELTRHYRKH 

YKCTWEGCDWRFARSDELTRHYRKH 

FNCSWDGCDKKFARSDELSRHRRTH 

FSCSWKGCERRFARSDELSRHRRTH 

FSCSWKGCERRFARSDELSRHRRTH 

FPCTWPDCLKKFSRSDELTRHYRTH 

FACPVESCVRSFARSDELNRHLRIH 

YPCPAEGCDRRFSRSDELTRHIRIH 

YACPVESCDRRFSRSDELTRHIRIH 

HACPAEGCDRRFSRSDELTRHLRIH 

YQCDFKDCERRFFRSDQLKRHQRRH 

YQCDFKDCERRFSRSDQLKRHQRRH 

YQCDFKDCERRFSRSDQLKRHQRRH 

FQCKACQRKFSRSDHLKTHTRTH 

FQCKTCQRKFSRSDHLKTHTRTH 

FQCKTCQRKFSRSDHLKTHTRTH 

FQCRICLRNFSRSDHLTSHVRTH 

FQCRI CMRSFSRSDHLTTHIRTH 

FQCRICMRNFSRSDHLTTHIRTH 

FQCRI CMRNFSRSDHliTTHIRTH 

YTCRYCGKIPPRSANLTRHLRTH 

YRCTVCGKHFSRSSNLIRHQKTH 

YVCKVCNKSFSWSSNLAKHQRTH 

YECEECGKVFSHSSNLIKHQRTH 

YECSECGKS FS FRS S FSQHERTH 

YICCECGKSFSNSSSFGVHHRTH 

CKCSECGKTFTYRSVFFRHSMTH 

YECSECGKTFSYHSVFIQHRVTH 

YGCNECGKSFSHSSSLSQHERTH 

YGCNECGKTPSHSSSLSQHERTH 

YKCPECGKSFSRSSHLVIHERTH 

YKCSECGESPSRSSRLMSHQRTH 

CRCNECGKSPSRRDHLVRHQRTH 

FKCSDCEKAPNSRSRLTLHQRTH 
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ZlSr42_HUMAN 1001 

2205_HUMAN 1002 

ZN75_HUMA]Sr 1003 

ZN07_HUMAN 1004 

O15090_HUMAN 1005 

094892_HUMAN 1006 

O95270_HUMAN 1007 

GFI1_HUMAN 1008 

Z135_HUMAN 1009 

O60765_HUMAN 1010 

060765_HUMAN 1011 

O60792_HUMAN 1012 

2151_HUMAN 1013 

EVIl_HUiyiAN 1014 

Z205_HUMAN 1015 

Z2 05_HUiyiAN 1016 

Z124_HUMAN 1017 

Z200__HUiyiAN 1018 

015361_HUMAN 1019 

ZN07_HUMAN 1020 

Z263_HUMAN 1021 

Q13134_HUMAN 1022 

Q13127_HUMAN 1023 

CTCF_HUMAN 1024 

Q99592_HUMAN 1025 

Q13397_HUMAN 1026 

960765_HUMAN 1027 

ZN74_HUMAN 1028 

ZN75_HUiyiAN 1029 

Z189_HUMAN 1030 

O758 02_HUiyiAN 1031 

Z186_HUMAN 1032 

Z186_HUMAN 1033 

ZN84_HUiyiAN 1034 

O60792__HUMAN 1035 

O75066_HUMAN 1036 

095878_HUMAN 1037 

P91805_SARPE 1038 

Z133_HUMAN 1039 

Z133_HUMAN 1040 

043336_HUMAN 1041 

075467_HUMAN 1042 

Z124_HUMAN 1043 

Z177_HUMAN 1044 

Z177_HUMAN 1045 

ZN84_HUMAN 1046 

Z135_HUiyiAN 1047 

Z135 HUMAN 1048 



FACPECGQRFSQRLKLTRHQRTH 

yPCPECGKCFSQRSNLIAHNRTH 

FKCDECGKRF I QNSHL I KHQRTH 

FKCDECGKGFVQGSHLIQHQRIH 

YPCPLCGKRFRFNSILSLHMRTH 

YRCSBCGKGFIVNSGLMLHQRTH 

HKCQVC6KAFSQSSNLITHSRKH 

HKCQVCGKAFSQSSNLITHSRKH 

YKCQECGKAFSHSSALIEHHRTH 

FKCKECSKAFSQS SALT QHQ I TH 

CKCKVCGKAFRQSSALIQHQRMH 

CKCNECGKAFSYCSALIRHQRTH 

YVCERCGKRFVQSSQLANHIRHH 

YECENCAKVFTDPSNLQRHIRSQH 

YVCDRCAKRFTRRSDLVTHQGTH 

HKCPICAKCFTQSSALVTHQRTH 

YGCTICEKVFNIPSSFQIHQRNH 

YTCPLCGKQFNE S S YLI SHQRTH 

YTCPLCGKQFNESSYLI SHQRTH 

YKCNKCTKAFGC S SRL I RHQRTH 

YQCNICGKCFSCNSNLHRHQRTH 

YKCELCP YS S SQKTHLTRHMRTH 

YKCELCPYSSSQKTHLTRHMRTH 

FQCSLCSYASRDTYKLKRHMRTH 

YTCSLCGKTFSCMYTLKRHERTH 

YTCSLCGKTFSCMYTLKRHERTH 

YKCSLCEKTF INTS SLRKHEKNH 

YKCSACEKAFSCSSLLSMHLRVH 

YKCQQCDRRFRWSSDLNKHFMTH 

YQCNQCKQSFSQRRSLVKHQRIH 

YQCNQCKQSFSQRRSLVKHQRIH 

YACNCCEKLFSYKSSLTIHQRIH 

YACDHCEKAFSHKSKLTVHQRTH 

YECRDCEKAFSQKSQLNTHQRIH 

YQCNKCEKTFSQSSHLTQHQRIH 

YACQYCDAVFAQSIELSRHVRTH 

YRCDICGKSFSQSATIiAVHHRTH 

YQCKVCQKRFPQLSTLHNHERTH 

YACKECGRCFRQRTTLVNHQRTH 

YVCGVCGHSFSQNSTLI SHRRTH 

YVCIECGKSLSSKYSLVEHQRTH 

YACAQCGRRFCRNSHLIQHERTH 

YECKQCGKAFSRSSHLRDHERTH 

YECNQCGKSFSTGSYLIVHKRTH 

YECDHCGKSFSQSSHLNVHKRTH 

YACGNCGKTFPQKSQFITHHRTH 

YECHECGK7VFTQITPLIQHQRTH 

YECNQCGRAFSQLAPLIQHQRIH 
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Z135_HUMAN 


1049 


YKCTQCGRTFNQIAPLIQHQRTH 


O60893_HUMAN 


1050 


YQCDTCGKGFTRTSYLVQHQRSH 


04333 7_HUMAN 


1051 


YKCKQCGKGFNRKWYLVRHQRVH 


Z205_HUMAN 


1052 


YRCEQCGKGFSWHSHLVTHRRTH 


Z202_HUMAN 


1053 


YRCDDCGKHFRlAlTSDLVRHQRTH 


ZN45_HUMAN 


1054 


YRCDVCGKRFRQRSYLQAHQRVH 


ZN45_HUMAN 


1055 


YQCDACGKGFSRSSDFNIHFRVH 


Z239_HUMAN 


1056 


YQCYECGKGFSQSSDLRIHLRVH 


Z239_HUMAN 


1057 


YKCDKCGKGFSQSSKLHIHQRVH 


Z239_HUMAN 


1058 


YHCGKCGKGFSQSSKLLIHQRVH 


Z239_HU]yiAN 


1059 


YKCGECGKGFSQSSNLHIHRCIH 


015322_HUMAN 


1060 


YKCDMCGKEFSQS S CLQTHERVH 


Z239_HUMAN 


1061 


YACQYCGKNFSQSSELLLHQRDH 


ZN07_HUMAN 


1062 


YPCKECGKAFSQS STLAQHQRMH 


Z133_HUMAN 


1063 


YVCKTCGRGFSLKSHLSRHRKTH 


Z133_HUMAN 


1064 


YVCGVCGRGFSLKSHLNRHQNIH 


Z133_HUMAN 


1065 


YVCGVCEKGFSLKKSLARHQKAH 


EVI1_HUMAN 


1066 


YRCKYCDRSFS I S SNLQRHVRNIH 


RRE1__HUMAN 


1067 


YKCQTCERTFTLKHSLVRHQRIH 


O75850_HUMAN 


1068 


YACAQCGRRFSRKSHLGRHQAVH 


O75850_HUMAN 


1069 


HACAVCARS F S S KTNLVRHQAIH 


O75850_HXJMAN 


1070 


YQCAQCARSFTHKQHLVRHQRVH 


ZN42_HUMAN 


1071 


F VC S ECGRSFSRS SHLLRHQLTH 


Z132_HUMAJSr 


1072 


FECSECGRDFSQSSHLLRHQKVH 


ZN35_HUiyiAN 


1073 


YECEKCGAAFI SNSHLMRHHRTH 


Z132 HUMAN 


1074 


YECSECGRAFSSNSHLVRHORVH 


Z202 HUMAN 


1075 


YKCMECGKSYTRS SHLARHQKVH 


Z134 HUMAN 


1076 


YE C S ECGKAY S L S SHLNRHO KVH 


Z239~HUMAN 


1077 


YECSKCGKGFSQSSNLHSHQRVH 


Z165 HUMAN 


1078 


YECSECGRAFSQSSNLSQHQRIH 


Z132 HUMAN 


1079 


YEC S ECGRAFNNNSNLAQHQKVH 


Z239 HUMAN 


1080 


YECEECGMSFSQRSNLHIHQRDH 


000153 HUMAN 


1081 


HQCQVCGKTFSQSGSRNVHMRKH 


Q13398_HUMAN 


1082 


YVCGECGKSF SHS SNLKNHQRVH 


015322 HUMAN 


1083 


YKCE I CGKS FCLRS S LNRHYMVH 


075123 HUMAN 


1084 


FKCAQCGKAFCHS SDLI RHQRVH 


014913 HUMAN 


1085 


YKCEECDKAFLYHSFLRRHKAVH 


014913 HUMAN 


1086 


YKCEECDKAFLHHSYLRKHOAVH 


ZNa3 HUMAN 


1087 

J. w o / 


X* XVV^XM a-iV—VjXVX-IX: xvuXM O X XJ V XS_L1\^A\.I7 xl 


015322_HUMAN 


1088 


HTCNECGKSFCYI SALRIHQRVH 


O60792_HUMAN 


1089 


FGCNDCGKSFRYRSALNKHQRLH 


Z137_HUMAN 


1090 


YKCNKCGKIFRHRSYLAVYQRTH 


075123__HUMAN 


1091 


YVCNVCGKDF IHYSGLIEHQRVH 


Z134_HUMAN 


1092 


YKCNECGKYFSHHSNLIVHQRVH 


043361_HUMAN 


1093 


FECSICGKFFSHRSTLNMHQRVH 


Z134__HUMAN 


1094 


FECIECGKFFSRS SDYI AHQRVH 


Z134_HUMAN 


1095 


FVCSKCGKDFIRTSHLVRHQRVH 


014913 HUMAN 


1096 


YKCQECGKSFCYRSYLRBHYRMH 
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Z174_HUMAN 1097 

O60765_HUMAN 1098 

043167_HUMAN 1099 

043829_HUiyiAN 1100 

O004 03_HUMAN 1101 

075626_HUMAN 1102 

015322_HUM7^ 1103 

BCL6_HUiyiAN 1104 

Z195_HUMAN 1105 

ZN85__HUiyiAN 1106 

Z239_HUMAN 1107 

Z239_HUMAN 1108 

015322_HUMAN 1109 

015322_HUT4AN 1110 

014913_HUMAN 1111 

014913_HUMAN 1112 

014913_HUMAN 1113 

ZN45_HUMAN 1114 

ZN45__HUiyiAN 1115 

ZN45_HUMAN 1116 

ZN45_HUMAN 1117 

ZN45_HUiyiAN 1118 

ZN45_HU]yiAN 1119 

ZN45_HUMAN 1120 

Z]SI45_HUMAN 1121 

ZN45_HUMAN 1122 

ZN45_HUMAN 1123 

ZN45_HUMAN 1124 

ZN45_HUMAN 1125 

075467_HUMAN 1126 

ZN42_HUMAN 1127 

060765_HUMAN 1128 

TYY1_HUMAN 112 9 

015391_HIJMAN 1130 

TYYl^HUMAN 1131 

015391_HUMAN 1132 

Q14872_HUMAN 1133 

GLI1_HUMAN 1134 

GLI3_HUMAN 1135 

060255_HUMAN 1136 

O60254_HUMAN 1137 

O60253_HXJMAN 1138 

060252__HUMAN 1139 

GLI2_HUiyiAN 1140 

O95409_HXJE^IAN 1141 

Q15915_HUiyiAN 1142 

ZIC3_HUiyiAN 1143 

GLIl HUMAN 1144 
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YKCDDCGKSFTWNSELKRHKRVH 
YRCKECGKSFSRRSGLFIHQKIH 
YS CGI CGKSFSDSSAKRRHC ILH 
FVCEMCTKGFTTQAHLKEHIiKIH 
FVCEMCTKGFTTQAHLKEHLKIH 
FKCQTCNKGFTQLAHLQKHYLVH 
FKCEQCGKGFRCRAILQVHCKLH 
YKCETCGARFVQVAHLRAHVLIH 
YKCEKCGKAFTQFSHLTVHES IH 
YKCKKCGKAFNQSAHLTTHEVIH 
YKCEKCGKGFTRSSSLLIHHAVH 
YKCEQCGKGFTRSSSLLIHQAVH 
YKCEECGKGFTDSLDLHKHQI IH 
YICEKCGRAFIHDLKLQKHQI IH 
YKCEKCGKGFFRSSDLQHHQKIH 
YKCEECGKCFSSFTSLKRHQI IH 
YPYKCEECGKGFSRS S KLQEHQT IH 
YKGEHCVKSFSWSSHLQINQRAH 
. YKCEECGKGFSWSSSLI IHQRVH 
YKCEECGKVFSWSSYLQAHQRVH 
YKCEKCDNAFRRFS S LQAHQRVH 
YKCERCGKAFSQFSSLQVHQRVH 
YKCEECGVGFSQRSYLQVHLKVH 
YKCEECGKSFSWRSRLQAHERIH 
YKCEECGKGFSVGSHLQAHQI SH 
YQCAECGKGFSVGSQLQAHQRCH 
YQCEECGKGFCRASNFLAHRGVH 
YKCEECGKGFCRASNLLDHQRGH 
YKCEECGKGFSQASNLLAHQRGH 
FVCALCGAAFSQGSSLFKHQRVH 
YHCGECGLGFTQVSRLTEHQRIH 
YRCNECGKGFTSISRLNRHRI IH 
YVCPFDGCNKKFAQSTNLKSHILTH 
FVCPFDVCNRKFAQSTNLKTHILTH 
FQCTFBGCGKRFSLDFNLRTHVRIH 
FQCTFEGCGKRFSLDFNLRTHLRIH 
YQCTFEGCPRTYSTAGNLRTHQKTH 
HKCTFEGCRKSYSRLENLKTHLRSH 
HKCTFEGCTKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
FQCEFEGCDRRFANSSDRKKHMHVH 
FKCEFEGCDRRFANSSDRKKHMHVH 
FKCBFEGCDRRFANSSDRKKHMHVH 
YMCEHEGCSKAFSNASDRAKHQNRTH 
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O60255_HUMAN 1145 

O60254_HUMAN 1146 

O60253_HUiyiAN 1147 

O60252_HUMAN 1148 

GLI3_HUMAN 1149 

GLI2_HUMAN 1150 

Z143_HUMAN 1151 

TF3A_HUMAN 1152 

TF3A_HUMAN 1153 

Q14a72_HUMAN 1154 

Q14872_HUMAN 1155 

2asr76_HUMAN 1156 

Z143_HUMAN 1157 

Q14872_HUiyiAN 1158 

O00153_HUMAN 1159 

ZN76_HIJMAN 1160 

Z143_HUiyiA]Sr 1161 

Q15915_HUMAN 1162 

O95409_HUMAN 1163 

ZIC3_HUMAN 1164 

Z]Sr76_HUiyiAN 1165 

Z143_HUiyiAN 1166 

O00153_HUMAN 1167 

ZN76_HUiyiAN 1168 

Z143_HUMAN 1169 

Q14872_HUMAN 1170 

ZN76_HUMAN 1171 

Z143_HUiyiAN 1172 

BTE1_HUMAN 1173 

BTE2_HUiyiAN 1174 

043839_HUMAN 1175 

UKLF_HUMAN 1176 

O95600_HUMAN 1177 

Q13118_HUMAN 1178 

075411_HUMMr 1179 

EZF__HUMAN 1180 

O14901_HUMAN 1181 

SP4_HUMAN 1182 

O60402_HUMAN 1183 

EKLF_HUMAN 1184 

WT1_HUMAN 1185 

Q16256_HUiy[AN 1186 

Q15881_HUMAN 1.187 

SP2_HUMAN 1188 

043167_HUMAN 1189 

075467__HUMAN 1190 

ZEP1_HUMAN 1191 

Q02646 HUMAN 1192 
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YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCTVPGCDKRFTEYSSLYKHHWH 

FKCTQEGCGKHFASPSKLKRHAKAH 

FVCDYEGCGKAFIRDYHLSRHI LTH 

FECDVQGCEKAFNTLYRLKAHQRLH 

FVCNQEGCGKAFLTSHSLRIHVRVH 

YRCDFPSCGKAFATGYGLKSHVRTH 

YQCBHAGCGKAFATGYGLKSHVRTH 

FRCDHDGCGKAFAASHHLKTHVRTH 

FICPAEGCGKSFYVLQRLKVHMRTH 

FQCPFEGCGRSFTTSNIRKVHVRTH 

FKCPFEGCGRSFTTSNIRKVHWTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKIFARSENLKIHKRTH 

YTCPEPHCGRGFTSATNYKNHVRIH 

YYCTEPGCGRAFASATNYKNHVRIH 

FMCHESGCGKQFTTAGNLKNHRRIH 

YKCPEELCSKAFKTSGDLQKHVRTH 

YRCSEDNCTKSFKTSGDLQKHIRTH 

PNCESEGCSKYFTTLSDLRKHI RTH 

FRCGYKGCGRLYTTAHHLKVHERAH 

FRCEYDGCGKLYTTAHHLKVHERSH 

HKCPYSGCGKVYGKSSHLKAHYRVH 

HYCDYPGCTKVYTKS SHLKAHLRTH 

HRCHFNGCRKVYTKSSHLKAHQRTH 

HRCQFNGCRKVYTKS SHLKAHQRTH 

HQCDFAGCSKVYTKSSHLKAHRRIH 

HICSHPGCGKTYFKSSHLKAHTRTH 

HICSHPGCGKTYFKSSHLKAHTRTH 

HTCDYAGCGKTYTKS SHLKAHLRTH 

YVCSFPGCRKTYPKSSHLKAHLRTH 

HICHIEGCGKVYGKTSHLRAHLRWH 

HICHIEGCGKVYGKTSHLRAHLRWH 

HTCAHPGCGKSYTKS SHLKAHLRTH 

FMCAYPGCNKRYPKLSHLQMHSRKH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

FMCAYPGCNKRYPKLSHLQMHSRKH 

HVCHI PDCGKTFRKTSLLRAHVRLH 

YACKDCHRKFMDVSQLKKHLRTH 

YACRACSKVFVKSSDLLKHLRTH 

YICEYCNRACAKPSVLLKHIRSH 

YICPYCSRACAKPSVLKKHIRSH 
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075362_HUMAN 1193 

Q92981_HUMAN 1194 

O76019_HUMAN 1195 

RREl^HUMAN 1196 

075626_HUMAN 1197 

Z202_HUMAN 1198 

075123_HUMAN 1199 

Z151_HUMAN 1200 

SNAI_HUMAN 1201 

043623_HUMAN 1202 

0954Q9_HUMAN 1203 

ZIC3_HUMAN 1204 

O00146_HUMAW 1205 

O00146_HUiyiAN 12 06 

IKAR_HUMAN 1207 

CTCF_HUMAN 1208' 

HKR3_HUMAN 12 09 

Q15552_HUMAN 1210 

043591_HUMAN 1211 

PLZP_HUMAN 1212 

Z151_HUMAN 1213 

MAZ__HUMAN 1214 

014753_HUMAN 1215 

095365_HUiyiAN 1216 

015156_HUMAN 1217 

O75066_HUMAN 1218 

095365_HUMAN 1219 

015156_HUMAN 1220 

Z151_HUMAN 1221 

*Z151_HUMAN 1222 

Z151_HUMAN 1223 

015090 HUMAN 1224 
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YACSYCGKFFRSNYYLNIHLRTH 

YKCVQPDCGKAFVSRYKLMRHMATH 

YKCVQPDCGKAFVSRYKLMRHMATH 

YACSVCNKRPWSIiQDLTRHMRSH 

HECQVCHKRFS STSNLKTHLRLH 

HDCS VCGKS FTCNSHLVRHLRTH 

YACDICGKTFTFNSDLVRHRISH 

HKCSVCSKAFVNVGDLSKHI IIH 

YACVCGTCGKAFSRPWLLQGHVRTH 

YACVCKI CGKAFSRPWLLQGHI RTH 

HVCFWEECPREGKPFKAKYKLVNHI RVH 

HVCYWEECPREGKSFKAKYKLVNHIRVH 

HECKLCGASFRTKGSLIRHHRRH 

HVCQFCSRGFREKGSLVRHVRHH 

FQCNQCGASFTQKGNLLRHI KLH 

HKCHLCGRAFRTVTLLRNHLNTH 

HVCEFCSHAFTQKANLNMHLRTH 

HVCEHCNAAFRTNYHLQRHVFIH 

HVCEHCNAAFRTNYHLQRHVFIH 

YICSECNRTFPSHTALKRHLRSH 

YVC I HCQRQF ADPGALQRHVRI H 

YI CALCAKEFKNGYNLRRHEAIH 

HLCTGCGKGFNDTFDLKRHVRTH 

YECNI CKVRFTRQDKLKVHMRKH 

YACEVCGVRFTRNDKLKIHMRKH 

YSCEECGAKFAANSTLKNHLRLH 

YLCQQCGAAFAHISrYDLKNHMRVH 

YSCPHCPARFLHSYDLKNHMHLH 

HKCEDCGKEFTHTGNFKRHIRIH 

YRCEDCGKLFTTSGNLKRHQLVH 

YKCRECGKQFTTSGNLKRHLRIH 

YDCPYCGKTFRTSHHLKVHLRIH 
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Example 3: Non-human zinc finger databases. 

For providing novel combinations of non-antigenic, optimised zinc fingers, for use in 
species other than humans, separate species-specific zinc finger databases are required, 
5 such as mouse, chicken, pig, cow, etc. 

The fingers listed below are in a format that can be linked vrith classical wild-type 
canonical "TGEKP" linkers (i.e. . . .TGEKP - zinc finger peptide sequence - TGEKP - 
zinc finger peptide sequence - TGEKP - etc. . For each peptide sequence, an 
10 oligonucleotide is designed to encode the peptide sequence; the oUgonucleotide can then 
be linked into a Ubrary selection system, as described in the Examples infra. 

Mouse Zinc Finger Database. 

1 5 544 zinc finger units 



Name 


SEQIDNO 


Peptide sequence 


035745_M0USE 


1225 


HQCTHCEKTFNRKDHLKNHLQTH 


ZFX2 MOUSE 


1226 


HRCEYCKKGFRRPSEKNQHIMRH 


ZPX1_M0USE 


1227 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFY2_M0USE 


1228 


HKCDMCSKGFHRPSELKKHVATH 


ZFY1_M0USE 


1229 


HKCDMCSKGFHRPSELKKHVATH 


ZFX2_M0USB 


1230 


HKCDMCDKGFHRPSELKKHVAAH 


ZFXl_MOUSE 


1231 


HKCDMCDKGFHRPSELKKHVAAH 


ZFA_MOUSE 


1232 


HKCDMCDKGFHRPSELKKHVAAH 


Q9Z162_M0USE 


1233 


YTCSVCGKGFSRPDHLSCHVKHVH 


MAZ^MOUSE 


1234 


YNCSHCGKSFSRPDHLNSHVRQVH 


Q08376_MOUSE 


1235 


YSCEVCGKSFIRAPDLKKHERVH 


Z151_M0USE 


1236 


HKCPHCDKKFNQVGNLKAHLKIH 


ZFX2_M0USE 


1237 


FRCKRCRKGFRQQSELKKHMKTH 


ZFX1_M0USE 


1238 


FRCKRCRKGFRQQSELKKHMKTH 


Q62518 MOUSE 


1239 


YVCTMCGKGYTLNSNLQVHLRVH 


Q60636_MOUSE 


1240 


YECNVCAKTFGQLSNLKVHLRVH 


Q9Z117_MOUSE 


1241 


CSCPECGKVLHQLSHLRSHYRLH 


Q61898_MOUSE 


1242 


CSCPECGREFHQLSHLRKHYRLH 


Oa8631_MOUSE 


1243 


YSCQYCGKVFHQLSHFKSHFTLH 


Q61164_MOUSE 


1244 


HKCPDCDMAFVTSGELVRHRRYKH 


035483_MOUSE 


1245 


FRCADCGRGFAQRSNLAKHRRGH 


035483_M0USE 


1246 


FVCGVCGAGFSRRAHLTAHGRAH 


O70162_MOUSE 


1247 


FVCRDCGQGFVRSARLEEHRRVH 


Q9Z1D8_M0USE 


1248 


HRCGDCGKFFLQASNFIQHRRIH 


035483_MOUSE 


1249 


HRCPDCGKGFGHSSDFKRHRRTH 


035483 MOUSE 


1250 


ADCGKSFVYGSHLARHRRTH 
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035483_MOUSE 

0882 82_MOUSE 

Q61065_MOUSE 

BCL6_M0USE 

O70162___MOUSE 

O70162_MOUSE 

Q9Z0G7_MOUSE 

Q08376_MOUSE 

Q64318_MOUSE 

Q64318_MOUSE 

Q9ZlD8_iyiOUSE 

Q9Z1D8_M0USE 

Q9Z2X6_MOUSE 

KID1_M0USE 

Q9Z1D7_M0USE 

ZF90_MOUSE 

Q9Z2X6_M0USE 

Q9Z2X6_MOUSE 

Q9Z2X6_MOUSE 

Q9Z2X6_MOUSE 

Q9Z2X6_MOUSE 

Q9Z2X6_MOUSE 

Q9Z2X6_MOUSE 

ZF37_MOUSE 

Q62514_iyiOUSE 

Q61491_]yiOUSE 

ZF37_MOUSE 

Q62514_]yiOUSE 

Q61491_MOUSE 

Q61491_MOUSE 

Q61491_MOUSE 

Q51491_MOUSE 

Q61491_iyiOUSE 

Q61491_iyiOUSE 

Q61491_MOUSE 

Q61491_MOUSE 

Q61491_iyiOUSE 

Q9Z2X6_MOUSE 

Q9Z2X6jyiOUSE 

Q61491_iyiOUSE 

Q9Z2X6_MOUSE 

Q61491__MOUSE 

Q64247_MOUSE 

Q9Z2X6_MOUSE 

Q9Z2X6_MOUSE 

Q64247_M0USE 

Q64247_iyiOUSE 

Q64247 MOUSE 



1251 FPCPDCGKRFVYKSHLVTHRRIH 

1252 YKCQLCRSAFRYKGNLASHRTVH 

1253 YKCDRCQASFRYKGNIiASHKTVH 

1254 YKCDRCQASFRYKGNIiASHKTVH 

1255 FACQDCGRRFNQSTKLI QHQRVH 

1256 - - CVECGERFGRRSVLLQHRRVH 

1257 - DCPVCNKKFKMKHHLTEHMKTH 

1258 - - - HMCDKAFKHKSHLKDHERRH 

1259 HECGICRKAFKHKHHLIEHMRLH 

1260 FKCTECGKAFKYKHHLKEHLRIH 

1261 FKCNECGKGFGRRSHLAGHLRLH 

1262 YGCNECGKS FGRHSHLI EHLKRH 

1263 YVCKQCGKAFTLSSSLRRH 

. 1264 YVCKECGKAFTLSTSLYKHLRTH 

1265 HGCDECGKS PTQHSRLI EHKRVH 

1266 YRCNLCGRSFRHSTSLTQHEVTH 

1267 YVCKECGKAFARSTSLHIHEGTH 

1268 YVCKHCGKAYTTYNTLRAHERSH 

1269 YVCKHCGKAYTTYNTLRAHERSH 

1270 YVCKHCGKAYTS YSTLRAHERSH 

1271 YVCKHCGKAYTS YSTLRAHERSH 

1272 YVCKHCGKAYTS YSTLRAHERSH 

1273 YVCKHCGKAFTQSSYLRIHKRTH 

1274 YECEQCGKAHGHKHALTDHLRIH 

1275 YECEQCGKAHGHKHALTDHLRIH 

1276 YECNQCGKAFTQFFPLKRHEITH 

1277 YKCDECGKAFGHSSSLTYHMRTH 

1278 YKCDECGKAFGHSSSLTYHMRTH 

1279 YQCNQCAKAPPYHRTLQIHERTH 
12 80 CEYNQCWKAFAYHKTLQIHERTH 

1281 YECNQCGKAFACYQSFQIHKRTH 

1282 YECNQCGKAFACNRYLQIHKRTH 

1283 YECNQCGKAPACPRYLQIHKRTH 

1284 YECNQCGKAF ACLRNLQNHKTTH 

1285 FECNQCGKAFAHHSTLQRHKRTH 

1286 YECNQCGKAFTRHSTLQIHKRTH 

1287 YECNQCGKAPTCRSNLQIHKRTH 

1288 YVCKQCGKAFTRSSHLQIHKITH 

1289 YICKQCGKAFARSSHLQIHKRSH 

1290 YKCKQCGKDFTHHSTLHIHKRIH 

1291 YS CKLCGKAPTHSNYLQ IHKRI H 

1292 YECNQCGKAFARNSNLLDHKRIH 

1293 YI CKQCGKTFRYLSCFQKHERI H 
12 94 YACKQCDKAFKYLSSLQNHKRIH 

1295 HACKQCGKSFKRQSNVQAHERNH 

1296 YTCKHCTKTFTTS STRNSHEKTH 
12 97 YACKHCGKAFTTSSAKNSHERIH 
1298 YACKHCGKAFTSSSDRNSHERIH 
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Q64247_M0USE 

Q64247_MOUSE 

088412_MOUSE 

ZF3 5_MOUSE 

Q9Z2X6_MOUSE 

ZF38_MOUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

ZF90_MOUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

ZF90_MOUSE 

Z151_M0USE 

OZF_MOUSE 

Q9Z0Q5_iyiOUSE 

Q9Z162_MOUSE 

Q9Z162_MOUSE 

MAZ_MOUSE 

Q61898_MOUSE 

Q60585_MOUSE 

035483_MOUSE 

Q60585_MOUSE 

Q9Z1D9_M0USE 

Q9Z1D9_M0USE 

088631_MOUSE 

Q60585_MOUSE 

MLZ4_M0USE 

Q9Z116_MOUSE 

O70237_MOUSE 

GFI1_M0USE 

Q61624_MOUSE 

P97475_MOUSE 

Q61624_iyiOUSE 

P97475_MOUSE 

ZFP1_M0USE 

Q9Z116_MOUSE 

Q9Z116_MOUSE 

ZFP1_M0USE 

Q06054_MOUSE 

Q06054_iyiOUSE 

Q06054_MOUSE 

Q06054_MOUSE 

Q06054_MOUSE 

Q06054_MOUSE 

Q06054_MOUSE 

Q06054_MOUSE 

iyiLZ4_M0USE 

ZF37 MOUSE 
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1299 YPCKYCGKAFATSSDRNSHERIH 

13 00 YSCTHCGKAFSSPSDYNSCERIH 

13 01 YVCNECGKAFTCS S YLL I HQRIH 

13 02 YMCNHCYKHFSQSSDLIKHQRIH 

13 03 YVCKQCGKAFAQSSYLHIHQRSH 

13 04 YQCKDCGKAFSGKGSLIRHYRIH 

1305 YECNKCGKAFSRITSLIVHVRIH 

1306 YECNECGKAFSQRTSLI VHVRIH 
13 07 YQCNVCGKAFKRSTSFIEHHRIH 
1308 YECKICGKAPCQSSSLTVHMRSH 
13 09 YECNVCGKAFSQSSSLTVHVRSH 

1310 YECIDCGKAFSQSSSLIQHERTH 

1311 CQCVICGKAFTQASSLIAHVRQH 

1312 YECKGCGKAFIQKSSLIRHQRSH 

1313 FECKDCGKAFIQKSNIil RHQRTH 

1314 TYCSKAFRDSYHLRRHQSCH 

1315 HACEMCGKAFRDVYHLNRHKLSH 

1316 HACEMCGKAFRDVYHLNRHKLSH 

13 17 FRCTECDKSFIRSSHLREHQKIH 

1318 FDCKECGKTFSRGYHLTLHQRIH 

1319 YACAECGRRFGQSAALTRHQWAH 

1320 YACTECGKSFRQVAHLTRHQRLN 

1321 YACPECGECFRQSSHLSRHQRTH 

1322 YKCFQCGERFRQSTHLVRHQRIH 

1323 YKCTKCDKLFTQYSHLRRHQRI Y 

1324 YKCTECKKAFRQHSHLTYHQRIH 

1325 HKCTECAKASAAS PHLIQHQRTH 

1326 YECTECSKAFCQKSHLTQHQRVH 

1327 YPCQPCGKRFHQKSDMKKHTYIH 

1328 YPCQYCGKRFHQKSDMKKHTFIH 

1329 FRCDECGMRFIQKYHMERHKRTH 

1330 FRCDECGMRFIQKYHMERHKRTH 

1331 FQCSQCDMRFIQKYLLQRHEKIH 
13 32 FQCSQCDMRFIQKYLLQRHEKIH 

1333 FVCNYCDKTFSFKSLLVSHKRIH 

1334 Y I CFECRKAFYRKSELTDHQRIH 

1335 YECKECGKAFCQKPQLTLHQRIH 

1336 YGCSECGKTFAQKFELTTHQRIH 

1337 YKCSDCGKCFIQKANLRTHQKIH 

1338 YKCSDCGKCFIQKANLRTHERIH 

1339 YKCSDCDKCFIQKAKLKKHQRIH 

1340 YKCSECDKCFIQKDHLRTHQRXiH 

1341 YKCSECDKCFIRKANLRRHHRIH 

1342 YKCSECHKCFIRKAHLRRHQRIH 

1343 YKCSECHKCFIQQAHLRRHQKIH 

1344 • YICAECNKCFIQKSQLKTHQRIH 

1345 HICSQCGKAFSQISDLNRHQKTH 

1346 YECNECGIAFSQKSHLWHQRTH 
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Q62514_M0USE 1347 

ZF3 7_MOUSE 1348 

Q62514_M0USE 134 9 

ZF37_MOUSE 1350 

Q62514_MOUSE 1351 

MFG3_]yiOUSE 1352 

Q62514_MOUSE 1353 

ZF37_MOUSE 1354 

Q9Z116_]yiOUSE 1355 

088412_M0USE 1356 

Q9Z116_MOUSE 1357 

P70405_MOUSE 1358 

2F90__MOUSB 1359 

KR2_M0USE 1360 

KR2_M0USE 1361 

Q9Z1D7_M0USE 13 62 

Q61116_M0USE 1363 

O7023 7_MOUSE 13 64 

GFIl_MOUSE . 13 65 

Q9Z150_MOUSE 1366 

Q9Z1D7_M0USE 1367 

ZF35_MOUSE 1368* 

ZF38_MOUSE 1369 

OZF_MOUSE 1370 

Q9Z0Q5_MOUSE 1371 

2FP1_M0USE 1372 

MLZ4__M0USE 1373 

Q62514_M0USE 1374 

ZF37__M0USE 1375 

KR2_M0USE 1376 

P70405_MOUSE 1377 

Q61117_MOUSE 1378 

ZF92_MOUSE 1379 

ZF29_M0USE 1380 

088282_MOUSE 1381 

Q61065_iyiOUSE 13 82 

BCL6_M0USE 1383 

ZF29_MOUSB 1384 

Q9ZlD7_MOUSE 13 85 

ZF35_MOUSE 1386 

ZF35_MOUSE 1387 

ZF35_iyiOUSE 1388 

ZFP1_M0USE 1389 

ZF35_MOUSE 1390 

088412_MOUSE 1391 

MLZ4_M0USE 1392 

MLZ4_M0USE 13 93 

KR2 MOUSE 1394 



YECNECGIAFSQKSHLVLHQRTH 

YECVECGKAFSQKSHLIVHQRPH 

YECVECGKAFSQKSHLIVHQRTH 

FECNECGKTFSKKSHLVIHQRTH 

FECNECGKTFSKKSHLVIHQRTH 

FECKECGKAFHFSSQLNNHKTSH 

FECYECGKAFNAKSQLVIHQRSH 

FECYECGKAFNAKSQLVIHQRSH 

YECKI CGKCFYWKTSFNRHQSTH 

YSCNECGKAFRQKSSLTVHQRTH 

YECAECGKAFSTKSYLTVHQRTH 

YECSKCGKTFRGKYSLDQHQRVH 

HECADCGKTFLWRTQLTEHQRIH 

YECMICGKHFTGRSSLTVHQVIH 

YECDQCGKAFIKNSSLIVHQRIH 

YKCSVCGKAFIQKISLIBHBQIH 

YKCDTCGKAFSQKSSLQVHQRIH 

- -CRMCGKAFKRSSTLSTHLLIH 

-DCKICGKSFKRSSTLSTHLLIH 

HSCGI CGKCFTQKSTLHDHLNLH 

YKCEVCGKTFRWRTVLI RHKWH 

-YKCMCGKAFSQCSAFTLHQRIH 

YKCKECGKAFNHSSNFNKHHRIH 

YGCNECGKAFSQFSTLALHMRIH 

YGCNECGKAFSQFSTLALHLRIH 

YECTECGKTFSQRSTLRLHLRIH 

YKCDECGKNFSQNSDLVRHRRAH 

YECNECGKAFKYGSSLTKHMRIH 

YECNECGKAFKYGS SLTKHMRIH 

YKCHDCGKAFSKNSSLTQHRRIH 

CRDCGKFFSQTSHLNDHRRIHTG 

YKCSTCGKGFSRSSDLNVHCRIH 

YLCQQCGKS FSRS FNL I KHRII H 

YACKECGESFSYNSNLIRHQRIH 

YRCSICGARFNRPANLKTHSRIH 

YRCNI CGAQFNRPANLKTHTRIH 

YRCNICGAQFNRPANLKTHTRIH 

YKCRDCGKSFSRSANLITHQRIH 

YQCLQCNKSFNRRSTLSQHQGVH 

YPCNSCSKSFSRGSDLIKHQRVH 

YPCSWCIKSFSRSSDLIKHQRVH 

YPCNQCTKSFSRLSDLINHQRIH 

YECDVCQKTFSHKANLI KHQRIH 

YECDKCGKTFSQSSNLILHQRIH 

YECNECGKTFTRSSNLIVHQRIH 

YDCNECGKSFGRSSHLIQHQTIH ■ 

YECTACGKSFSRSSHLITHQKIH 

YECTECGKAFSQSAYLIEHRRIH 
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ZF90_MOUSE 

MLZ4_M0USE 

P70405_MOUSE 

P70405_MOUSE 

Q9Z1D8_M0USE 

KID1_M0USE 

P704 05_MOUSE 

P70405_MOUSE 

P70405_MOUSE 

Q9Z1D8_M0USE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

088412_MOUSE 

ZF35_M0USE 

ZF35_MOUSE 

KIDljy[OUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

KID1_M0USE 

088412_MOUSE 

088412_MOUSE 

088412_M0USE 

KR2_M0USE ' 

ZF38_iyiOUSE 

KID1_M0USE 

O35700_MOUSE 

EVI1_M0USE 

Q62518_MOUSE 

Q9Z1D8_M0USE 

Q9Z1D8_M0USE 

Q9Z1D7_M0USE 

MFG3_M0USE 

MFG3_M0USE 

088412_MOUSE 

Q9Z116_MOUSE 

Q60585_MOUSE 

Q60585_MOUSE 

ZF3 7_M0USE 

Q62514_MOUSE 

Q61849_MOUSE 

MFG3_M0USE 

Q61849_MOUSE 

Q06054_MOUSE 

035700 MOUSE 



13 95 YACKECGRNFSRSSALTKHHRVH 

1396 YECTECDKSFSRSSALIKHKRVH 

1397 YKCSECGKSFSQSSILIQHRRIH 
13 98 YKCSECGNSFSQSAILNQHRRIH 

13 99 HQCNECGKSFIQSAHLIQHRRIH 

1400 YRCQECGMSFGQSSALIQHRRIH 

1401 YECSQCGKSFSQKSGIiIQHQWH 

1402 YECRECGKSFSQKATLIKHQRVH 

1403 YECSQCGKSFSQKATLVKHKRVH 

14 04 HQCNECGRGFSLKSHLSQHQRIH 
14 05 YQCSECGKAFSQKSHHIRHQRIH 
1406 YQCSECGKAFSQKSHHIRHQKIH 
14 07 YDCSECGKAFSQLSCLIVHQRIH 

1408 YKCSECGKAFNQSSVLILHQRIH 

1409 YKCDVCGKAFSQSSDRILHQRIH 

1410 FKCNTCGKTFRQSSSRIAHQRIH 

1411 YKCNECGTI FRQKQYLI KHHNIH 

1412 FKCNECGTAFGQKKYLIKHQNIH 
1413 ' FECSQCGRAFSQKQYLIKHQNIH 

1414 FECNECGKAFSQKQYVIKHQSTH 

1415 FKCNECGKAFSQKENLI IHQRIH 

1416 FECSDCGKAFSQKENLLTHQKIH 

1417 FKCSECGRAFSQSASLIQHERIH 

1418 FECHECGKAFIQSANLWHQRIH 

1419 FTCSECGKGFSQSANLWHQRIH 

1420 FACSDCGKAFTQSANLI VHQRSH 

1421 YKCHECGKAFSQSMNLTVHQRTH 

1422 YQCNECGKSFSQHAGLSSHQRLH 

1423 YNCNECGKALS SHSTLI IHERIH 

1424 YKCDQCPKAFNWKSNLIRHQMSH 

1425 YKCDQCPKAFNWKSNLIRHQMSH 

1426 YKCDVCGKSFGWRSNLI IHHRIH 

1427 YACHLCGKAFRVRSHLVQHQS VH 

1428 YKCQVCGKAFRVSSHLVQHHSVH 

1429 YECNDCGKAFVYNSSIiATHQETH 

1430 YKCNACGRAFNRRSNLMQHEKIH 

1431 YKCNVCGKAFNRRSNLLQHQKIH 

1432 YVCGKCGKAFTQSSNLTVHQKIH 

1433 YECKECRKAFYDKSNLKRHQKIH 

1434 YECKECRKFFRRYSELI SHQGIH 

1435 YECKECGKAFRQCAHLSRHQRIH 

1436 YECIECGKAFKQNASLTKHMKIH 

1437 YECIECGKAFKQNASLTKHMKIH 

1438 YECNECGKAFKRHRSFVRHQKIH 

1439 FECKDCGKVFRLNIHLIRHQRFH 

1440 YECKECGKAFRLPQQLTRHQKCH 

1441 HRCNECGKSLSSSSGLQRHQRIH 

1442 HACPECGKTFATS SGLKQHKHIH 
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EVI1_M0USE 1443 

ZF92_MOUSE 1444 

088412_MOUSE 1445 

ZF90_MOUSE 1446 

KID1_M0USE 1447 

ZF29_M0USE 1448 

OZFjyiOUSE 1449 

• O70162_MOUSE 1450 

ZFP1_M0USE 1451 

088412jyiOUSE 1452 

Q9Z1D7__M0USE 1453 

ZF90_MOUSE 1454 

Q64247_MOUSE ' 1455 

MFG3_M0USE 1456 

Q6 1 84 9_MOUSE 1457 

MFG3_M0USE 1458 

Q6184 9__MOUSE 1459 

Q6424 7_MOUSE 1460 

MFG3_M0USE 1461 

Q64247_MOUSE 1462 

Q6424 7_MOUSE 1463 

ZF90_MOUSE 14 64 

MFG3_M0USE 1465 

iyiFG3_M0USE 1466 

iyiFG3_M0USE 1467 

ZFP1_M0USE . . 1468 

ZF92_MOUSE 1469 

054978_MOUSE 1470 

O70162_MOUSE 1471 

O70162_MOUSE 1472 

O70162_MOUSE 1473 

O70162_MOUSE 1474 

O70162_MOUSE 1475 

Q9ZlD8_iyiOUSE 1476 

Q9ZlD8_MOUSE 1477 

ZF37_MOUSE 1478 

Q62514_MOUSE 1479 

088282_MOUSE 1480 

Q61065_MOUSE 1481 

BCL6_M0USE 1482 

Q60585_MOUSE 1483 

Q60585_iyiOUSE 1484 

Q60585_MOUSE 1485 

OZF_MOUSE 1486 

OZF_MOUSE 1487 

Q9Z0Q5_MOUSE 1488 

MFG3_M0USE 1489 

Q61849 MOUSE 1490 



HACPECGKTFATSSGLKQHKHIH 
YECGECGKTFTRSSNLVKHQVIH 
FKCSECEKAFSYSSQIiARHQKVH 
FECNVCGKAFRHSS SLGQHENAH 
YECWTCGKLFNHRSSLTNHYKIH 
YKCDECGKSFSDGSNFSRHQTTH 
YKCGECGKAFSQRGNFLSHQKQH 
CDVCGKVFSQRSNLLRHQKIHTG 
YECNECAKTFFKKSNIjI IHQKIH 
YKCKDCEKAFSCFSHLIVHQRIH 
YKCNECGRAFGQWSALNQHQRLH 
YQCSLCGKAFQRSSSIiVQHQRIH 

CGKVFILSGDLIKHERIH 

YECEQCGSAFRLPYQLTQHQRIH 
FECELCGSAFRCRSQLNKHLRIH 
FKCKLCESAFRRKYQIiSEHQRIH 
FKCQECGKAFWLAYLI EHQS IH 
FVCKQCGEAFVNS SHLI SHERIH 
FQCKECGRAFVRSTGLRIHERIH 
FVCKTCGKAFSRSDYLINHKRIH 
FVCKKCGKAFKRLGHFMNHERIH 
FQCKECGKAFSRCSSLVQHERTH 
FECKDCGKAFTVLAQLTRHQTIH 
FHCKVCGKAFTVLAQLTRHENIH 
FECKECGKSPKRVSSLVEHRI IH 
FECPECGKAFTHQSNLIVHQRAH 
FECTECGKAFSRSSNLIEHQRIH 
FECQECGEAFARRSELIEHQKIH 
FRCTECGQSFRQRSNLLQHQRIH 
FACAECGQSFRQRSNLTQHQRIH 
FACPECGQSPRQHANLTQHRRIH 
YACAECGKAFRQRPTLTQHLRTH 
AECGKTFRQRATLTQHLCVHTGE 
FRCEECGKSYNQRVHLIQHHRVH 
FKCGECGKSYNQRVHLTQHQRVH 
FECNQCGKAFKQ I EGLTQHQRVH 
FECNQCGKAFKQIEGLTQHQRVH 
YPCPTCGTRPRHLQTLKSHVRIH 
YPCE I CGTRFRHLQTLKSHLRIH 
YPCEICGTRPRHLQTLKSHLRIH 
YDCKECGKAFRVRQQLTLHERIH 
YDCKECGKAFRVRGQLMLHQRIH 
YECGECGKAFKVRQQIiTFHQRIH 
YACKECGKAFNGKSYLKEHEKIH 
YTCKECGKAFSGKSNIiTEHEKIH 
FICKECGKTFSGKSNLTEHEKIH 
YKCKDCGKCFGCKSNLHQHES IH 
YQCKECGKCFRQRSKLTEHESIH 
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Q61849_MOUSE 1491 

Q61849_MOUSE 1492 

ZF92_MOUSE 1493 

2F92_MOUSE 1494 

ZF35_MOUSE 1495 

P70405_MOUSE 1496 

REX1_M0USE 1497 

TYY1_M0USE 1498 

ZFX2_M0USE 1499 

ZFX1_M0USE 1500 

ZFA_MOUSE 1501 

ZFY2_M0USE 1502 

ZFYl^MOUSE 1503 

Q61116_MOUSE 1504 

Q06054_MOUSE 1505 

Q9Z117_MOUSE 1506 

Q61898_MOUSE 1507 

Q60585_MOUSE * 1508 

Q60585_MOUSE 1509 

Q60585_MOUSE 1510 

KR2__M0USE 1511 

KID1_M0USE 1512 

KR2_M0USE 1513 

ZF37_M0USE 1514 

Q62514_MOUSE 15X5 

KID1_M0USE 1516 

ZF37_M0USE 1517 

Q62514_MOUSE 1518 

Q9Z117_M0USE 1519 
088631__MOUSE ' 1520 

Q61898_MOUSE 1521 

Q9Z1D7_M0USE 1522 

03573 8_M0USE 1523 

O89090_MOUSE 1524 

Q64167_MOUSE 1525 

O89087_MOUSE 1526 

Q62445_MOUSE 1527 

O89091_MOUSE 1528 

Q61596_MOUSE 1529 

BTE1_M0USE 1530 

Q62445_MOUSE 1531 

Q64167_MOUSE 1532 

O89090_MOUSE 1533 

O89087_MOUSE 1534 

Q60843_MOUSE 1535 

EZF_MOUSE 1536 

Q60980_MOUSE 1537 

035738 MOUSE 1538 



YECKECGKCFGCRSTLTQHQSVH 
FECEECGKKFRTARHLVKHQRIH 
FVCRMCGKVFRRSFALLEHTRIH 
YECSECGKQFQRSLALLEHQRIH 
YECEECGKAFRMSSALVLHQRIH 
YECSECGKLFRQNSSLVDKQKTH 
HVCAECGKAFTESSKLKRHFLVH 
HVCAECGKAFVESSKLKRHQLVH 
HI CVECGKGFRHPSELKKHMRIH 
HI CVECGKGFRHPSELKKHMRIH 
HICVECGKGFCHPSELKKHMRIH 
HICGECGKGFRHPSALKKHIRVH 
FICGECGKGFRHPSALKKHIRVH 

- -CHECGKGFRQSSALQTHQRVH 
YQCRKCGKCFRTYSSLYRHRRTH 
HQCEKCRKCFSTASSLTVHKRIH 
HQCGKCGKCFNTSSSLTVHHRIH ■ 
YDCKECGKAFRLFSQLTQHQSIH 
YKCMECEKTFRLLSQLTQHQSIH 
YDCKECGKAFRLHSSLIQHQRIH 
YQCKECGKAFRKNSSLIQHERIH 
YLCNECGNTFKSSSSLRYHQRIH 
YGCDECGKTFRQSSSLLKHQRIH 
YKCNECGKTFRHSSNLMQHLRSH 
YKCNECGKTFRHS SNLMQHLRSH 
YKCNBCGKTFRCNSSLSNHQRTH 
YECKECGKSFRYNSSLTEHVRTH 
YECKECGKSFRYNSSLTBHVRTH 
YKCKECGKSFLELSHLKRHYRIH 
HKCKECGKSFFILSHLKTHYRIH 
YECKECGKSFIELSHLKRHYRIH 
HGCDECGKSFTQHSRLIEHKRVH 
FKCADCDRRFSRSDHLALHRRRH 

- -CPECPKRFMRSDHIiSKHIKTH 

- -CPECPKRFMRSDHLSKHIKTH 

- -CPECPKRFMRSDHLSKHIKTH 

- -CPECSKRFMRSDHLSKHVKTH 

- - CPMCDRRFMRSDHLTKHAEUm 

- - CPMCDRRFMRSDHLTKHARRH 

- -CPLCEKRFMRSDHLTKHARRH 

F I CNWMFCGKRFTRSDELQRHRRTH 
FMCNWSYCGKRFTRSDELQRHKRTH 
FMCNWSYCGKRFTRSDELQRHKRTH 
FMCNWSYCGKRFTRSDELQRHKRTH 
YHCNWEGCGWKFARSDELTRHYRKH 
YHCDWDGCGWKFARSDELTRHYRKH 
YKCTWEGCTWKFARSDELTRHFRKH 
YKCTWEGCTWKFGRSDELTRHYRKH 
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Q9Z0Z7__MOUSE 153 9 

O70261_MOUSE 1540 

EKLF_MOUSE 1541 

Q61596_MOUSE 1542 

O89091_MOUSE 1543 

BTE1_M0USE 1544 

EGR2_M0USE 1545 

WTl^MOUSE 1546 

WT1_M0USE 1547 

EGR1_M0USE 1548 

KR2_M0USE 154 9 

O35700_MOUSE 1550 

EVI1_M0USE 1551 

ZF29_M0USE 1552 

ZF38_MOUSE 1553 

Q9Z1D8_M0USE 1554 

ZF29_MOUSE 1555 

ZF29_MOUSE 1556 

ZF38_MOUSE 1557 

ZF29_MOUSE 1558 

ZF90_MOUSE 1559 

MLZ4_M0USE 1560 

ZF29_M0USE 1561 

MLZ4_M0USE 1562 

MLZ4_M0USE 1563 

O70162_MOUSE 1564 

035483_MOUSE 1565 

035483_MOUSE 1566 

ZFP1_M0USE 1567 

GFI1_M0USE 1568 

O70237_MOUSE 1569 

ZF29_M0USE 1570 

KIDl__MOUSE 1571 

KID1_M0USE 1572 

Z151_M0USE 1573 

O35700_MOUSE 1574 

EVI1_M0USE 1575 

Q60585_MOUSE 1576 

Q92116_MOUSE 1577. 

KR2_M0USE 1578 

Q61164jyiOUSE 1579 

P97365_MOUSE 1580 

KID1_M0USE 1581 

ZF35_MOUSE 1582 

ZF3 5_MOUSE 1583 

ZF38_MOUSE 1584 

Q9Z1D9_M0USE 1585 

Q921D9 MOUSE 1586 



YKCTWEGCDWRFARSDELTRHYRKH 

YACSWDGCDWRFARSDELTRHYRKH 

YACSWDGCDWRFARSDELTRHYRKH 

FSCSWKGCERRFARSDELSRHRRTH 

FSCSWKGCERRFARSDELSRHRRTH 

FPCTWPDCLKKFSRSDELTRHYRTH 

YPCPAEGCDRRFSRSDELTRHIRIH 

YQCDFKDCERRFSRSDQLKRHQRRH 

FQCKTCQRKFSRSDHLKTHTRTH 

FQCRICMRNFSRSDHLTTHIRTH 

YQCNECGKPFSRSTNLTRHQRTH 

YTCRYCGKI FPRSANLTRHLRTH 

YTCRYCGKIFPRSANLTRHLRTH 

FQCAECGKSFSRSPNIilAHQRTH 

YVCTKCGKAFSHSSNLTLHYRTH 

YQCDSCGKAFSYSSDLIQHYRTH 

YQCGECGKNFSRSSNLATHRRTH 

YRCPECGKGFSNSSNFITHQRTH 

YI CAECGKAPSNS SNLTKHRRTH 

YECLTCGESFSWSSNLIKHQRTH 

YECNECGEAFSRLS SLTQHERTH 

YHCNECGENFSRI SHLVQHQRTH 

YKCLMCGKSFSRGS I LVMHQRAH 

YECEECGKSFSRSSHIjAQHQRTH 

YKCYECGKGFSRSSHLIQHQRTH 

FACPECGQRFSQRLKLTRHQRTH 

FPCPECGKRFSQRSVLVTHQRTH 

- -CDECGKGFVYRSHLAIHQRTH 

YECSECGKSPIQNSQLIIHRRTH 

HKCQVCGKAFSQSSNLITHSRKH 

HKCQVCGKAFSQSSNLITHSRKH 

YKCTECGQKFSQSSALITHRRTH 

FKCKECSKAFSQSSALiIQHQITH 

CKCKVCGKAFRQSSALIQHQRMH 

YVCERCGKRFVQSSQLANHIRHH 

YECENCAKVFTDPSNLQRHIRSQH 

YECENCAKVFTDPSNLQRHIRSQH 

YECKKCAKI PTCS SDLRGHQRSH 

yectvcrks fi cks s fshhwrth 
ytcnvcdkhfierssltvhqrth 
fqcslcsyasrdtyklkrhmrth 
fqcwlcsakfki s sdlkrhmrvh 
ykcsmcektfintsslrkheknh 

YTCNLCSKSPSQSSDLTKHQRVH 
YHC S SCNKAFRQS SDIil LHHRVH 
YWCSHCGKTPCSKSNLSKHQRVH 
YKCGDCEKSFRQRSDLFKHQRTH 
YKCDSCEKGFRQRSDIiFKHQRIH 



wo 02/099084 



PCTAJS02/22272 



ZF35_MOUSE 1587 

ZF35_MOUSE 158 8 

ZF35_MOUSE 1589 

ZF35_MOUSE 1590 

ZF35_MOUSE 1591 

ZF35_MOUSE 1592 

Q9Z1D9_M0USE 1593 

Q9Z1D9_M0USE 1594 

Q9Z116_MOUSE 1595 

Q06054_MOUSE 1596 

Q9Z116_MOUSE 1597 

ZF29_M0USE 1598 

MLZ4_M0USE 1599 

ZF37_MOUSE 1600 

Q62514_MOUSE 1601 

ZF90_MOUSE 1602 

Q61491_MOUSE 1603 

ZF35_MOUSE 1604 

Q64247_MOUSE 1605 

Q61116_MOUSE 1606 

035483_iyiOUSE 1607 

ZF29_M0USE 1608 

Q61117_M0USE 1609 

Q9Z2U2_MOUSE 1610 

Q61116_MOUSE 1611 

Q61117_MOUSE 1612 

Z23 9_MOUSE 1613 

Z239_MOUSE 1614 

Z239_M0USE 1615 

Z239_iyiOUSE 1616 

ZF35_MOUSE 1617 

ZF38_MOUSE 1618 

O35700_MOUSE 1619 

EVI1_M0USE 1620 

035483__MOUSE 1621 

035483_MOUSE 1622 

O70162_MOUSE 1623 

088412_MOUSE 1624 

088631_MOUSE 1625 

088631_MOUSE 1626 

Z239_MOUSE 1627 

Z239__MOUSE 1628 

MLZ4_M0USE 162 9 

MLZ4_iyiOUSE 163 0 

Q61116_MOUSE 1631 

Q61116_MOUSE 1632 

Q62518_MOUSE 1633 

Q9Z150 MOUSE 1634 
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YPCSQCSKMFSRJtSDLVKHYRIH 

YQCSHCSKSFSQHSGMVKHLRIH 

YACTQCPRSFSQKSDLIKHQRIH 

YPCAQCNKSFSQNSDLIKHRRIH 

YMCNHCYKHFSQSSDLIKHQRIH 

YNCDECDQS FAWSTGLI RHQRTH 

YQCQECGKRFSQSAALVKHQRTH 

YACWCGRRFSQSATLIKHQRTH 

YECKQCMKTFYRKSGLTRHQRTH 

YECKQCSKSPYTSSHLENHYRTH 

YECQLCQKAFYCTSHLIVHQRTH 

YECPQCGKTFSRKSHLITHERTH 

YECVQCGKGFTQSSNLI THQRVH 

YECNHCGKVLSHKQGLLDHQRTH 

YECNHCGKVLSHKQGLLDHQRTH 

YECNECGRAFRKKTNLHDHQRTH 

YECNQCGRAFRQYVYLQCHERIH 

YPCAQCGKSFSQRSDLVNHQRVH 

YVCEQCGKGFIQLKYLLMHQRSH 

YTCQQCGKGFSQASYFHMHQRVH 

YRCVFCGAGFGRRSYCVTHQRTH 

YRCGDCGKGFSQRSQLWHQRTH 

YRCDI CGKRFRQRSYLHDHHRIH 

FKCWP S CTKTFTRNSNLRAHCQLVH 

YRCDSCGKGFSRSSDLNIHRRVH 

YQCHACWKSFCHSSEFNNHI RVH 

YQCYECGKGFSQSSDLRIHLRVH 

FKCDRCGKGFSQSSKLHIHKRVH 

YHCGKCGQGFSQSSKLLIHQRVH 

YKCGECGKGFSQSSNLHIHRCTH 

YKCDECGKAFSQSSDLMIHQRIH 

YDCKCGKAFGQS SDLLKHQRMH 

YRCKYCDRSFSISSNLQRHVRNIH 

YRCKYCDRS FS I S SNLQRHVRNIH 

YRCVFCGRSFSQSSALARHQAVH 

YLCSNCGRRFSQSSHLLTHMKTH 

FVCGECGRSFSRSSHLLRHQLTH 

YECAKCGAAFI SNSHLMRHHRTH 

YKCMECDRSYIQYSHLKRHQKVH 

YKCKECGKSYAYRTGLKRHQKIH 

YECSKCGKGFSQSSNLHIHQRVH 

YACEECGMSFSQRSNLHISQRVH 

YECNECWRS FGERSDLI KHQRTH 

YECHECGRGFSERSDLIKHYRVH 

YECNECGKRFSLSGNLDIHQRVH 

YKCGDCGKRFSCSSNLHTHQRVH 

YKCGECGKSFI CS SNLYI HQRVH 

CPRCGKQFimSSNLNRHMNVHRG 
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Q61116_MOUSE 

QG1116__M0USE 

Q62518_MOUSE 

Q62518_MOUSE 

Q61898_MOUSE 

Q61898_MOUSE 

Q9Z117_MOUSE 

Q61898_MOUSE 

08863 l_MOUSE 

Q61898_MOUSE 

Q9Z117_MOUSE 

Q61898_MOUSE 

088631__MOUSE 

Q9Z117_MOUSE 

Q61898_MOUSE 

Q9Z117_MOUSE 

Q61898_MOUSE 

Q9Z117_iyiOUSE 

Q06054_MOUSE 

Q06054_MOUSE 

Q06054_MGUSE 

Q06054__iyiOUSE 

Q06054_MOUSE 

Q61898_iyiOUSE 

Q61898_MOUSE 

Q61898_MOUSB 

Q9Z117_iyiOUSE 

Q61898_MOUSE 

Q9Z117_MOUSE 

088631_iyiOUSE 

Q61898_MOUSE 

Q9Z117_MOUSE 

088631___MOUSE 

088631_MOUSE 

Q9Z117_MOUSE 

Q61898_MOUSE 

Q9Z117_MOUSE 

Q61898_MOUSE 

KID1_M0USE 

ZF29_!yiOUSE 

Q9Z117_MOUSE 

Q61898_MOUSE 

088631_MOUSE 

Q08376_MOUSE 

Q60636_MOUSE 

Q61116_MOUSE 

088282_iyiOUSE 

Q61065 MOUSE 



1635 FHCSVCGKNFSRSSHFLDHQRIH 

1636 KCNVCQKQFSKTSNLQAHQRVH 

1637 YSCDVCGKGFSRS SQLQSHQRVH 

1638 FKCDACGKSFSRSSHLRSHQRVH 

1639 YKCRECDKSFTQRAYLRJSTHHNRVH 

1640 YKCMECDKSFTHNSNFRTHQRVH 

1641 YKCMECNKSFTQDSHLRTHQRVH 

1642 YKCI ECDKSFTQVSHLRTHQRVH 

1643 YKCSECDKSFTQASQLRTHQRVH 

1644 YKCNECDRSFTHYASLRVJHQKTH 

1645 YKCKECDKS FAHCS SFRRHQKTH 

1646 YKCKECDKSFAHYPNFRTHQKIH 

1647 YKCKDCD I FFNHYS SLRRHQKVH 

1648 YKCKDCDISFIQISNLRRHQRVH 

1649 YKCRDCDI SFSQI SNLRRHQKLH 

1650 FKCRECDKSFTKCSHLRRHQS VH 

1651 YKCRECDKSFIHSSHLRRHQNVH 

1652 YKCRECDKS F I QRSNLI IHQRVH 

1653 YKCSECEKSFTCGSVLRKHQKIH 

1654 YKCSECEKSFTVGSDLRMHQKIH 

1655 YKCSECEKCFTWSDLRTHQKIH 

1656 YKCSECEKSFTVGSSLRIHQRIH 

1657 YKCECGKSFTVGSDLRKHQKCH 

1658 YKCIECGKSFTNNSYLRTHQKVH 

1659 YRCKECDKSFHESATLREHEKSH 

1660 YRCAECDKSFTRCSYLRAHQKIH 

1661 YRCKECDKSFTECSTLRAHQKIH 

1662 YRCKECDKSFTSCSTLKAHQS IH 

1663 YICKECGKSFTRCSYLRAHQKIH 

1664 YVCKECGKSLTTCAILRAHQKIH 

1665 YECKECGKS FTTCSTLR IHQT I H 

1666 YICKECGKSFTKCSTLQIHQKIH 

1667 YTCKQCGKS FTRGSTLRVHQRIH 

1668 YKCNI CDKSFTECSSLKEHRKTH 

1669 YKCEVCDKSFTVNSTLKTHLKIH 

1670 YKCEI CDKSFTTTTTLKTHQKIH 

1671 YKCSVCGKSFTQCTNLKTHQRLH 

1672 YKCS VCDKS PTQCTHLKIHQRRH 

1673 YRCKECGKSFGRRSGLFIHQKVH 

1674 YSCPECGKSFGNRSSLNTHQGIH 

1675 YKCKECGKSFPQLSALKSHQKIH 

1676 YKCKECEKS F VQLS ALKSHQKLH 

1677 YKCNDCGKSFSYLSALQSHHKRH 

1678 FVCEMCTKGFTTQAHLKEHLKIH 

1679 FKCQTCNKGFTQLAHLQKHYLVH 

1680 YKCEVCGKGFTQWAHLQAHERIH 

1681 YKCETCGSRFVQVAHLRAHVLIH 

1682 YKCETCGARFVQVAHLRAHVLIH 
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BCL6_M0USE 1683 

08863 1_M0USE 1684 

Q61116_MOUSE 1685 

Z239_MOUSE 1686 

ZF29_MOUSE 1687 

Q62518_MOUSE - 1688 

Q61117_M0USE 1689 

Q61117_MOUSE 1690 

Q61116_MOUSE 1691 

Q61117_MOUSE 1692 

Q61116_MOUSE 1693 

Q61116_iyiOUSE 1694 

Q61117__M0USE 1695 

Q61117_M0USE 1696 

Q61117_MOUSE 1697 

Q61117_MOUSE 1698 

Q62518_M0USE 1699 

ZF29_MOUSE 1700 

O70162_MOUSE 1701 

KID1_M0USE 1702 

TYY1_M0USE 1703 

REX1_M0USE 1704 

TYY1_M0USE 1705 

1VITF1_M0USE 1706 

GLI_iyiOUSE 1707 

GLI3_M0USE 1708 

ZIC2_M0USE 1709 

ZIC1_M0USE 1710 

ZIC3_M0USE 1711 

ZIC4_M0USE 1712 

GLI_MOUSE 1713 

GLI3_M0USE 1714 

O70230_MOUSE 1715 

MTFl^MOUSE 1716 

MTF1_M0USE 1717 

O70230_MOUSE 1718 

MTFl^MOUSE 1719 

O70230_MOUSE 1720 

ZIC4_M0USE 1721 

ZIC2_M0USE 1722 

ZIC1_M0USE 1723 

ZIC3_M0USE 1724 

O7023 0_MOUSE 1725 

O70230_MOUSE 1726 

MTF1_M0USE 1727 

O70230_MOUSE 1728 

BTE1_M0USE 1729 

Q920Z7 MOUSE 1730 



YKCETCGARFVQVAHLRAHVLIH 

YRCE VCDKWFTL S S S LSRHQKI H 

YRCEVCGKRFPWSLSLiHSHQSVH 

YKCDKCGKGFTRS S SLLVHHSLH 

YKCGLCGKSFSQSSSLIAHQGTH 

YKCVDCGKEFSRPSSLQAHQGIH 

YRCEECGKGFSWSSSLLIHQRAH 

YKCEECGKVFSWSSYLKAHQRVH 

FKCEECGKEFRWSVGLSSHQRVH 

YKCETCGKAFSRVS I LQVHQRVH 

YKCEECGKGFSSASSFQSHQRVH 

YKCGECGKGFSHASSLQAHHSVH 

YQCAECGRGFTVESHLQAHQRSH 

YQCEECGRGFCRASNFLAHRGVH 

YKCEECGKGFTRASTLLDHQRGH 

YVCEECGKGFSQASHLLAHQRGH 

YNCETCGSAFSQASHLQDHQRLH 

YRCPECGKGFSWNSVLI IHQRIH 

YCCGECDLGFTQVSRLTEHQRIH 

YRCSECGKGFTS I SRLiNRHRI IH 

YVCPFDGCNKKFAQSTNLKSHILTH 

YQCTFEGCGKRFSLDFNLRTHIRIH 

FQCTFEGCGKRFSLDFNLRTHVRIH 

YQCTFEGCPRTYSTAGNLRTHQKTH 

HKCTFEGCRKSYSRLENLKTHLRSH 

HKCTFEGCTKAYSRLENLKTHLRSH 

FQCEFEGCDRRFANSSDRKKHMHVH 

PKCEFEGCDRRFANS SDRKKHMHVH 

FKCEFEGCDRRFANS SDRKKHMHVH 

FRCEFEGCERRFANSSDRKKHSHVH 

YMCEQEGCSKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCTVPGCDKRFTEYSSLYKHHWH 

FECDVQGCEKAFNTLYRLKAHQRLH 

FVCNQEGCGKAFLTSYSLRIHVRVH 

YQCEHSGCGKAFATGYGLKSHFRTH 

FRCDHDGCGKAFAASHHLKTHVRTH 

FKCPIEGCGRSFTTSNIRKVHIRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

PPCPFPGCGKIFARSENLKIHKRTH 

YYCTEPGCGRAFASATNYKNHVRIH 

YRCSEDNCTKSFKTSGDLQKHIRTH 

FNCESQGCSKYFTTIiSDLRKHIRTH 

FRCKYDGCGKLYTTAHHLKVHERSH 

HKCPYSGCGKVYGKSSHLKAHYRVH 

- -CDYNGCTKVYTKSSHLKAHLRTH 
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Q60980_MOUSE 1731 

035738_MOUSE 1732 

Q61596_M0USE 1733 

O89091_MOUSE 1734 

Q60843_MOUSE 1735 

EZF_MOUSE 1736 

Q64167_M0USE 1737 

O89090_MOUSE 1738 

089087_MOUSE 1739 

Q62445_M0USE 1740 

O702 61_MOUSE 1741 

EKLF_MOUSE 1742 

WT1_M0USE 1743 

ZEP1_M0USE 1744 

Q61479_MOUSE 1745 

O55140_iyiOUSE 1746 

Q60636_MOUSE 1747 

SNAI_MOUSE 1748 

P97469_MOUSE 1749 

2IC2_M0USE 1750 

2IC3_M0USE 1751 

Q62065_MOUSE 1752 

Q62065_MOUSE 1753 

IKAR_MOUSE 1754 

Q9Z2Z2_MOUSE 1755 

HELI_MOUSE . 1756 

Q61164_MOUSE 1757 

Q6.1624_MOUSE ' 1758 

P97475_M0USE 1759 

Z151_M0USE 1760 

Q62511_MOUSE 1761 

MAZ_MOUSE 1762 

08893 9_M0USE 1763 

Q64321_MOUSE 1764 

P97365__MOUSE 1765 

088939_MOUSE 1766 

Q64321_MOUSE 1767 

Z151_M0USE 1768 

Z151_M0USE 1769 

Z151 MOUSE 1770 



HRCDYDGCNKVYTKSSHLICAHRRTH 

HRCDFEGCNKVYTKS SHLKAHRRTH 

HICSHPGVGKTYFKSSHLECAHVRTH 

HICSHPGCGKTYFKSSHLKAHVRTH 

HTCSYTNCGKTYTKS SHLKAHLRTH 

HTCDYAGCGKTYTKSSHLKAHLRTH 

HICHIQGCGKVYGKTSHLRAHLRWH 

HICHIQGCGKVYGKTSHLRAHLRWH 

HICHIQGCGKVYGKTSHLRAHLRWH 

HVCHIEGCGKVYGKTSHLRAHLRWH 

HTCGHEGCGKSYTKSSHLKAHLRTH 

HTCGHEGCGKS YSKS SHLKAHLRTH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

YI CEYCNRACAKPSVLLKHIRSH 

YICQYCSRPCAKPSVLQKHIRSH 

YICPYCSRACAKPSVLKKHIRSH 

HECQVCHKRFSSTSNLKTHLRLH 

CVCTTCGKAFSRPWLLQGHVRTH 

CVCKICGKAFSRPWLLQGHIRTH 

HVCFWEECPREGKPFKAKYKLVNHIRVH 

HVCYWEECPREGKSFKAKYKLVNHIRVH 

HECKLCGASFRTKGSLIRHHRRH 

HVCQFCSR6FREKGSLVRHVRHH 

FQCNQCGAS FTQKGNLLRHI KLH 

FHCNQCGASFTQKGNLLRHI KLH 

FHCNQCGASFTQKGNLLRHI KLH 

HKCHLCGRAFRTVTLLRNHLNTH 

HVCEHCNAAFRTNYHLQRHVFIH 

HVCEHCNAAFRTNYHLQRHVFIH 

YVCTHCQRQFADPGGLQRHVRIH 

YICEYCARAFKSSHNLAVHRMIH 

YICALCAKEFKNGYNLRRHEAIH 

YECNI CKVRFTRQDKLKVHMRKH 

- -CEVCGVRFTRNDKLKIHMRKH 
PHKCEVCGKCFSRKDKLKTHMRCH 
YLCQQCGAAFAHNYDLKNHMRVH 
YSCPHCPARFLHSYDLKNHMHLH 
HKCEDCGKEFTHTGNFKRH I RI H 
YRCGDCGKLFTTSGNLKRHQLVH 

- KCREC6KQFTTSGNLKRHLRI H 



Chicken database. 

35 finger imits SEQ ID NO 

Q92010_CHICK 1771 YSCBVCGKSFIRAPDLKKHERVH 
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Q90851_CHICK 1772 

Q90850_CHICK 1773 

Q90851_CHICK 1774 

Q90850_CHICK 1775 

CTCF_CHICK 1776 

ZKR1_CHICK 1777 

ZKR1_CHICK 1778 

ZKR1_CHI CK 1779 

ZKR1__CHICK 1780 

Q92010_CHICK 1781 

O42408_CHICK 1782 

DEFI__CHICK 1783 

O42408_CHICK 1784 

DEFI_CHICK 1785 

O42409_CHICK 1786 

O42409__CHICK 1787 

ZKR1_CHICK 1788 

ZKR1_CHICK 1789 

ZKR1_CHICK 1790 

042409_CHICK 1791 

057415___CHICK 1792 

CTCF_CHICK 1793 

057415_CHICK 1794 

Q92010_CHICK 1795 

057415_CHICK 1796 

057415_CHICK 1797 

057415_CHICK 1798 

Q91051_CHICK 1799 

012939_CHICK 1800 

057415_CHICK ' 1801 

IKAR_CHICK 1802 

CTCF_CHICK 1803 

093567_CHICK 1804 

093567 CHICK 1805 



Plant Database. 



YPCTI CGKKFTQRGTMTRHMRSH 
YPCTICGKKFTQRGTMTRHMRSH 

- - CDACGMRFTRQYRLTEHMRIH 
- - CDACGMRFTRQYRLTEHMRIH 
HKCPDCDMAFVTSGELVRHRRYKH 

- TCGDCGKGFAWASHLQRHRRVH 
HRCGDCGKGFAWASHLQRHRRVH 
HRCGDCGKGFVWASHLERHRRVH 

- - CPDCGKSPPWASHLERHRRVH 

- -CHMCDKAFKHKSHLKDHERRH 
HECGI CKKAFKHKHHLIEHMRLH 
HBCGICKKAPKHKHHLIEHMRLH 
FKCTECGKAFKYKHHLKEHLRIH 
FKCTECGKAFKYKHHLKEHLRIH 
YPCQYCGKRPHQKSDMKKHTYIH 
FECKMCGKTFKRSSTLSTHLLIH 
YECPECGEAFSQGSHLTKHRRSH 
YECPECGEAFSQGSHLTKHRRSH 
YSCPECGESYSQSSHLVQHRRTH 
HKCQVCGKAFSQSSNLITHSRKH 
YQCNICDYIAADKAALIRHLRTH 
FQCSLCSYASRDTYKLKRHMRTH 
YKCQTCERTFTLKHSLVRHQRIH 
FVCEMCTKGFTTQAHLKEHLKIH 

- TCP YCPRVP S WAS SLQRHMLTH 
HSCSICGKSLSSASSLDRHMLVH 

- -CTVCNKRFWSLQDLTRHMRSH 
CVCKICGKAFSRPWLLQGHIRTH 
CVCKMCGKAFSRPWLLQGHIRTH 
YKCSVCGQSFTTNGNMHRHMKIH " 
FQCNQCGASFTQKGNLLRHIKLH 
HKCHLCGRAFRTVTLLRNHIiNTH 
YECNI CNVRPTRQDKLKVHMRKH 
YLCQQCGAAFAHlSr^DLKNHMRVH 



52 finger units SEQ ID NO 
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UZ 2 0 8 9__PETHY 


1806 


HECS ICGEQFLLGQALGGHMRKH 


02 2 0 8 o^PETHY 


1807 


HECS FCGEDFP TGQALGGHMRKH 


02 2 0 8 v^PETHx 


1808 


- ECS FCGEDFPTGQALGGHMRKH 


Q3 9 0 9 2_ARATH 


1809 


HKCKLCWKSFANGRALG6HMRSH 


Q3 92 17_ARATH 


18i0 


HKCS ICSQSFGTGQALGGHMRRH 


P93713_PETHY 


1811 


HECS ICGLEFAIGQALGGHMRRH 


02 2 0 8 6_PETHY 


1812 


HECS ICGLEFP IGQALGGHMRRH 


02 208 5_PBTHY 


1813 


HE CS I CGME FS LG QALGGHMRRH 


O22084_PETHY 


1814 


HECS ICGMEFSLGQALGGHMRRH 


Q4 2 4 5 3_ARATH 


1815 


HPCP ICGVKFPMGQALGGHMRRH 


Q4 2 4 1 0_ARATH 


1816 


HPCP ICGVEFPMGQALGGHMRRH 


O65150_TOBAC 


1817 


HVCS ICHKAFPTGQALGGHKRRH 


Q4 0 8 9 7_PETHY 


1818 


HVCS ICHKAFPTGQALGGHKRRH 


Q4 0896_PETHY 


1819 


HVCS ICHKAFPSGQALGGHKRRH 


Q4243 0_WHEAT 


1820 


HRCS ICQKEFPTGQALGGHKRKH 


Q40899_PETHY 


1821 


HECS ICHKCFPTGQALGGHKRCH 


P93166_SOyBN 


1822 


HECS ICHKSFPTGQALGGHKRCH 


Q96289_ARATH 


1823 


HVCTICNKSFPSGQALGGHKRCH 


Q4 2 4 2 3_ARATH 


1824 


HVCTICNKSFPSGQALGGHKRCH 


02 2 5 3 3_ARATH 


1825 


HVCS ICHKSFATGQALGGHKRCH 


Q4 0898_PETHY 


1826 


HE CS ICHKCFS S GQALGGHKRRH 


Q38895_ARATH 


1827 


YTCS FCKREFRS AQALGGHMNVH 


023621_ARATH 


1828 


YTCNFCRREFRSAQALGGHMNVH 


O80942_ARATH 


1829 


YTCSFCRREFKSAQALGGHMNVH 


P93714_PETHY 


1830 


HECSYCGTVEFTSGQALGGHMRRH 


Q43614_PETHY 


1831 


HECAICGAEFTSGQALGGHMRRH 


O22083_PETHY 


1832 


HECSICGAEFTSGQALGGHMRRH 


Q41070_PEA 


1833 


HECSICGAEFTSGQALGGHMRRH 


Q42375_ARATH 


1834 


HECSICGSEFTSGQALGGHMRRH 


065499_ARATH 


1835 


HKCNICFRVFSSGQALGGHMRCH 


O22090_PBTHY 


1836 


HECPVCFRVFSSGQALGGHKRTH 


O220B2_PBTHY 


1837 


HECPVCYRVFSSGQALGGHKRSH 


P93717_PBTHy 


1838 


HECS ICHRVFSTGQALGGHKRCH 


O04177_BRARA 


1839 


HTCS ICFKSFS SGQALGGHKRCH 


O04176_BRARA 


1840 


HTCS ICFKSFS SGQALGGHKRCH 


P93715_PETHY 


1841 


HQCS ICHRVFSSGQALGGHKRCH 


Q39092_ARATH 


1842 


HECPICAKVFTSGQALGGHKRSH 


P93719_PETHY 


1843 


HECPYCDRVFKSGQALGGHKRSH 


P93718_PBTHY 


1844 


HACPFCPRMFKSGQALGGHKRSH 


O22091_PETHY 


1845 


YBCPLCFKIFQSGQALGGHKRSH 


Q42430_WHEAT 


1846 


- KCS VCGKSFS S YQALGGHKTSH 


O04177_BRARA 


1847 


YKCTVCGKSFSSYQALGGHKTSH 


O04176_BRARA 


1848 


YKCTVCGKSFSSYQALGGHKTSH 


Q96289_ARATH 


1849 


YKCSVCDKTFSSYQALGGHKASH 


Q424232aRATH 


1850 


YKCSVCDKTFSSYQALGGHKASH 


Q40897_PBTHY 


1851 


YKCSVCDKSFSSYQALGGHKASH 


Q40896_PBTHY 


1852 


YKCSVCDKSFSSYQALGGHKASH 


Q4089B_PBTHY 


1853 


YKCNVCNKSFHSYQALGGHKASH 


O65150JTOBAC 


1854 


YKCSVCDKAFSSYQALGGHKASH 


P93166_SOyBN 


1855 


YKCSVCDKSFPSYQALGGHKASH 


Q40899_PBTHY 


1856 


YKCSVCGKGFGSYQALGGHKASH 


022533_ARATH 


1857 


YKCSVCDKAFSSYQALGGHKASH 



Arabidopsis Database 



SEQIDNO 
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Q9ZU64/169-191 


1858 


YTC P KCNS I FDTSQKFAAHMS SH 


023621/40-62 


1859 


YTCNFCRREFRSAQALGGHMNVH 


O23504/5-27 


1860 


HKCKLCSKSFCNGRALGGHMKSH 


Q9SYC5/250-275 


1861 


WYCSCGSDFKHKRSLKDHVKAFGNGH 


Q9SYC5/224-246 


1862 


FACRMCGKAFAVKGDWRTHEKNC 


022533/89-111 


1863 


YKCSVCDKAFSSYQALGGHKASH 


022533/148-170 


1864 


HVCS I CHKSFATGQALGGHKRCH 


Q9SN24/149-171 


1865 


HNCSICFKSFPSGQALGGHKRCH 


Q9SN24/94-116 


1866 


YKCSVCGKSFPSYQALGGHKTSH 


Q9STI7/117-140 


1867 


YFCGVCDRRFYTNEKLINHFKQIH 


Q9STM3/1296-1320 


1868 


LKCPWKGCKMTFKWAWSRTEHIRVH 


Q9STM3/1243-1268 


1869 


YQCNMBGCTMSFSSEKQLMLHKRNIC 


Q9STM3/1271-1290 


1870 


KGCGKNFFSHKYLVQHQRVH 


Q9STM3/1326-1352 


1871 


YVCAEPDCGQTFRFVSDFSRHKRKTGH 


Q9STM3/1296-1320 


1872 


LKCPWKGCKMTFKWAWSRTEHIRVH 


081801/142-164 


1873 


PMCNVCGKGFASWKAVFGHLRQH 


065601/61-83 


1874 


QKCEKCSREFCSPVNFRRHNRMH 


Q9SFY6/118-140 


1875 


YKCSVCDKTFSSYQALGGHKASH 


Q9SFY6/174-196 


1876 


HVCTICNKSFPSGQALGGHKRCH 


065245/147-171 


1877 


FYCELCSKQYRTVMEFEGHLSSYDH 


Q39261/52-74 


1878 


PS CNYCQRKFYS SQALGGHQNAH 


Q9SSW0/118-140 


1879 


HVCSVCGKSFATGQALGGHKRCH 


Q9SSW0/75-97 


1880 


YKCGVCYKTFSSYQALGGHKASH 


Q39262/61-83 


1881 


FSCNYCQRTFYSSQALGGHQNAH 


Q9SSW1/164-186 


1882 


HTCSICFKSFASGQALGGHKRCH 


Q9SSW1/97-119 


1883 


YKCTVCGKSFSSYQALGGHKTSH 


Q9ZPT0/145'-167 


1884 


WVCERCSKGYAVQSDYKAHLKTC 


Q9ZPT0/67-89 


1885 


YICBICNQGFQRDQNLQMHRRRH 


Q9ZPT0/172-193 


1886 


HSCDCGRVFSRVESFIEHQDNC 


Q39263/85-107 


1887 


PSCNYCQRKFYSSQALGGHQNAH 


Q9SGD1/291-316 


1888 


WYCTCGSDFKHKRSLKDHIRSFGSGH 


Q9SGD1/265-287 


1889 


FS CGKCGKALAVKGDWRTHEKNC 


Q9SGD1/180-202 


1890 


FACS I CSKTFNRYNmQMHMWGH 


Q9SSW2/106-128 


1891 


YKCNVCEKAFPSYQALGGHKASH 


Q9SSW2/165-187 


1892 


HECSICHKVPPTGQALGGHKRCH 


Q39264/60-82 


1893 


HECQYCGKEFANSQALGGHQNAH 


P93815/7-30 


1894 


QECAVCKRVFLSSHQLISHYNAAH 


Q39265/41-63 


1895 


YECQYCCREFANSQALGGHQNAH 


Q39266/59-81 


1896 


FSCNYCRRKFYSSQALGGHQNAH 


Q39267/93-115 


1897 


FECHYCFRNFPTSQALGGHQNAH 


Q9SVY1/301-323 


1898 


FMCRKCGKAFAVRGDWRTHEKNC 


Q9SVY1/217-239 


1899 


FSCPVCFKTFNRYNNMQMHMWGH 


Q9SGH2/1804-1827 


1900 


IHCLICHKTFASDDEFEDHTESKC 


038895/47-69 


1901 


YTCSFCKREFRSAQALGGHMNVH 


Q9SLB8/49-71 


1902 


YTCSFCRREFRSAQALGGHMNVH 


Q9SL35/188-210 


1903 


HECS I CGSEFTSGQALGGHMRRH 


Q9SL35/113-135 


1904 


YECKTCNRTFSSFQALGGHRASH 
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081013/49-71 


1905 


HFCVICEKQFSSGKAYGGHVRIH 


081013/119-141 


1906 


I RCCLCGKEFQTMHSLFGHMRRH 


023395/664-686 


1907 


LHCEKCGKALQPTEMEKHLKVFH 


Q9SI97/34-56 


1908 


FACKTCNKEFPSFQALGGHRASH 


Q9SI97/78-100 


1909 


HECPICGAEFAVGQAIiGGHMRKH 


Q9SR34/35-57 


1910 


YVCSFCIRGFSNAQALGGHMNIH 


Q42485/68~90 


1911 


FSCNYCQRKFYSSQALGGHQNAH 


082389/126-149 


1912 


FPCNSCGEIFPKINLLENHIAIKH 


Q9SQX8/182-204 


1913 


YQCKTCDRTFPSFQALGGHRASH 


Q9SQX8/261-283 


1914 


HECGI CGAEFTSGQALGGHMRRH 


065499/222-244 


1915 


HKCNICFRVFSSGQALGGHMRCH 


065499/77-99 


1916 


RPCTECGRKFWSWKALFGHMRCH 


065499/162-184 


1917 


FECGGCKKVFGSHQAIjGGHPJVSH 


Q9SCM4/220-243 


1918 


DVCPKCSRGFRDPVDLLKHIDKDH 


Q96289/80-102 


1919 


YKCSVCDKTFSSYQALGGHKASH 


Q96289/136-158 


1920 


HVCTICNKSFPSGQALGGHKRCH 


Q9SCQ6/139-161 


1921 


WKCDKCSKKYAVQSDWKAHSKI C 


Q9SCQ6/166-187 


1922 


YKCDCGTLFSRRDSFITHRAFC 


Q9SCQ6/63-85 


1923 


FVCE I CNKGFQRDQNLQLHRRGH 


Q9SFS1/70-92 


1924 


YVCEI CNQGFQRDQNLQMHRRRH 


Q9SFS1/148-170 


1925 


WI CERCSKGYAVQSDYKAHLKTC 


Q9SFS1/175-196 


1926 


HSCDCGRVFSRVESFIEHQDTC 


Q9SSA6/575-598 


1927 


IHCLICHKTFASDDEFEDHTESKC 


Q42410/39-61 


1928 


FTCKTCLKQFHSFQALGGHRASH 


Q42410/82-104 


1929 


HPCPI CGVEFPMGQALGGHMRRH 


Q9XFP6/12-35 


1930 


VWCYYCDREFDDEKILVQHQKAKH 


Q9XFP6/36-59 


1931 


FKCHVCHKKLSTASGMVIHVLQVH 


022238/218-241 


1932 


VSCGSCKKTFlSrSGNALESHNKAKfl 


Q42453/40-62 


1933 


FRCKTCLKEFSSFQALGGHRASH 


Q42453/86-108 


1934 


HPCP I CGVKFPMGQALGGHMRRH 


Q42375/113-135 


1935 


YECKTCNRTFSSFQALGGHRASH 


Q42375/188-210 


1936 


HBCSICGSEFTSGQAIiGGHMRRH 


022759/159-181 


1937 


WKCEKCSKFYAVQSDWKAHTKI C 


022759/186-207 


1938 


YRCDCGTLFSRKDTFITHRAPG 


022759/82-104 


1939 


FVCEICNKGFQRDQNLQLHRRGH 


Q9ZUL3/81-103 


1940 


FI CEVCNKGFQREQNLQLHRRGH 


Q9ZUL3/157-179 


1941 


WKCDKCSKRYAVQSDWKAHSKTC 


Q9ZUL3/184-205 


1942 


YRCDCGTLFSRRDSFITHRAFC 


P93751/95-117 


1943 


FECHYCFRNFPTSQALGGHQNAH 


081827/196-219 


1944 


VSCHKCGEKFSKLEAAEAHHLTKH 


Q9ZUL4/82-104 


1945 


WKCEKCSICRYAVQSDWKAHSKTC 


Q9ZUL4/109-130 


1946 


YRCDCGTI FSRRDSYITHRAFC 


Q9ZXJL4/6-28 


1947 


FI CDVCNKGFQREQNLQLHRRGH 


Q9SHD0/194-216 


1948 


FKCETCGKVFKSYQALGGHRASH 


Q9SHD0/243-265 


1949 


HECPICFRVFTSGQALGGHKRSH 


Q9SHD0/4-26 


1950 


YKCRFCFKSFINGRALGGHMRSH 


064936/131-153 


1951 


yqcnvcgrelpsyqalgghkash 
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064936/179-201 


1952 


HKCSICHREFSTGHSLGGHKRLH 


Q9SIJ0/65-87 


1953 


RPCTECGKQFGSLKALFGHMRCH 


Q9SIJ0/148-170 


1954 


FECDGCKKVFGSHQALGGHRATH 


Q9SIJ0/211-233 


1955 


HRCNI CSRVFS SGQALGGHMRCH 


Q9SLD4/47-69 


1956 


FECKTCNKRF S S FQALGGHRASH 


Q9SLD4/94-116 


1957 


HKCS I CSQS FGTGQALGGHMRRH 


Q9ZU93/121-143 


1958 


FECPICKNPFTSEEEVSVHVESC 


Q9SFT3/177-200 


1959 


CACPQCGEVFPKLESLEHHQAVRH 


Q9ZQE0/244-266 


1960 


YTCPKCNGVFNTSQKFAAHMS SH 


Q42423/80-102 


1961 


YKCSVCDKTFSSYQALGGHKASH 


Q42423/136-158 


1962 


HVCTI CNKS FPSGQALGGHKRCH 


Q9ZWA6/146-168 


1963 


WKCEKCAKRYAVQSDWKAHSKTC 


Q9ZWA6/173-194 


1964 


YRCDCGT I FSRRDSF I THRAFC 


Q9ZWA6/70-92 


1965 


FLCEICGKGFQRDQNLQLHRRGH 


080942/39-61 


1966 


YTCSFCRREPKSAQALGGHMNVH 


Q39217/90-112 


1967 


HKCS I CSQSFGTGQALGGHMRRH 


Q39217/43-65 


1968 


FECKTCNKRFSSFQALGGHRASH 


Q39092/160-182 


1969 


FECETCEKVFKSYQALGGHRASH 


Q39092/209-231 


1970 


HECPI CAKVFTSGQALGGHKRSH 


Q39092/5-27 


1971 


HKCKLCWKS FANGRALGGHMRSH 


081793/138-160 


1972 


PVCHI CGRGFGS WKAVFGHMRMI 


064828/530-553 


1973 


LQCI PCGSHFGDKEQLLVHVQAVH 


064828/599-622 


1974 


FVCKFCGLKFNLLPDLGRHHQAEH 


064828/496-519 


1975 


FACAICLDSFVRRKLLEIHVEERH 


049591/251-278 


1976 


FMCLYCNELCRPFSSLEAVRKHMEAKSH 


049591/26-50 


1977 


LTCNACNMEFKDEEERNLHYKSDWH 


049591/90-114 


1978 


YTCAICAKGYRSSKAHEQHLQSRSH 



There follow several examples of how to construct and select DNA-binding sub-domains 
from libraries of natural zinc fingers. 

5 

Example 4: Human Zinc Finger Module ^Mini-Library'. 

As a preliminary test of the efficacy of using natural zinc finger modules for constructing 
novel DNA-binding domains, a 'mini-library* of natural, human zinc finger modules is 
10 generated. The mini-library comprises 8 zinc finger modules, which have the following 
nomenclature assigned to them in the human genome database: Zif26S finger 1, ZiS68 
finger 2, Spl finger 3, WTl finger 1, 015391, 075626, ZN45 and Z165. Since there is 
more than one zinc finger module belonging to the zinc fingers proteins ZN45 and Zl 65, 
we have called the selected modules ZN45-(AAA) and Z165-(GCC) resfpectively. 
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according to their predicted binding site. We have also predicted the binding sites for the 
zinc fingers 015391 and 075626. The preferred binding sites for Zif268 finger 1, Zif268 
finger 2, Spl finger 3 and WTl finger 1 are akeady known. The amino acid sequences of 
each of the stated modules, and their predicted or previously determined binding 
5 sequences are shown in Table 3. 

Two 3-zinc finger peptide libraries are prepared, containing the 8 zinc finger modules 
stated. All novel 3-finger peptides contain a leader sequence, MAEERP (SEQ ID 
NO: 16), at the start of the peptide and are tagged by the sequence 

10 LRQKDGGGSYPYDVPDYA (SEQ ID NO: 1989) at the C-terminus. This sequence 

provides: (in the absence of a ftirther C-terminal finger) a suitable temiinus to the final a- 
helix of the peptide -LRQKD- (SEQ ID NO: 1987) as found in wild-type Zif268; a short, 
flexible linker sequence, GGGS (SEQ ID NO:2121); and an HA-tag (YPYDVPDYA 
(SEQ ID NO:2122)), which is recognised by the HA-antibody. Adjacent zinc finger 

1 5 modules are fiised using the linker peptide sequence TGEKP (SEQ ID N0:3). The 
pqptide sequences described above are also displayed in Table 3. 

In the first library (library 1), the 8 zinc finger modules are recombined in random order 
to create 3-finger peptides with all possible combinations of the 8 zmc finger modules. 

20 Such a procedure results in a library diversity of 5 12 (=8^), comprising pqptides that are 
predicted to bind to any possible combination of the binding sites assigned in Table 3. 
Library 1 allows novel 3-finger domains to be selected as a unit, for specified 9 bp target 
sequences. Such 3-finger units may be used for the construction of poly-zinc finger 
peptides as described in Moore, M., Choo, Y. & Klug, A. (2001) Proc. NatL Acad. Set 

25 USA 98: 1432-1436; and WO 01/53480. 

Iq the second library (library 2), the 8 zinc finger modules are randomly recombined to 
create 2-finger peptides which are all joined to the C-terminus of Zif268 finger 1 . The 
invariant finger 1 acts as an anchor for the selection, both by providing extra affinity to 
30 stabilise the selection, and by fixing the register of the protein DNA interaction (as 

discussed supra). Such a library has a diversity of 64 (=8^), and allows novel 2-finger 
units to be selected for a given 6 bp target sequence. The resulting 2 finger units can be 
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recovered by PGR and used in the construction of poly-zinc finger peptides (based on 
strings of 2-j5nger units), as described in WO 01/53480. 

These two libraries (encoding 3 -finger peptides) are screened, as described below, for the 
5 ability of their encoded proteins to bind three different 9 bp binding sequences: 5'-GCG- 
TGG-GCG-3'; 5'-.GGA-TAA-GCG-3'; and 5'.GCC-GAG-TGG-3\ 

As positive controls, the genes encoding the 3-finger peptides predicted to bind the above 
target sequences are specifically constructed and tested in a similar manner. 

10 



X 


FmGER/UNIT 


SEQIDNO: 


PEPTIDE SEQUENCE 


SITE 


1 


ZIF268 Fl 


1979 


YACPVESCDRRFSRSDELTRHIRIH 


GCG 


2 


ZIF268 F2 


1980 


FQCRICMRNFSRSDHLSTHIRTH 


TGG 


3 


SplF3 


1981 


FSCPICEKRFMRSDHLTKHARRH 


GGG 


4 


WTl Fl 


1982 


FMCAYPGCa^KRYFKLSHLQMHSRKH 


GAG 


5 


015391 


1983 


FVCPFDVCNRKFAQSTNLKTHILTH 


TAA^ 


6 


075626 


1984 


FKCQTCNKGFTQLAHLQKHYLVH 


GGA^ 


7 


ZN45-AAA . 


1985 


YKCEECGKGFSQASNLLAHQRGH 


AAA^ 


8 


Z165-GCC 


1986 


YECNECGKSFAESSDLTRHRRIH 


GCC^ 


9 


leader 


16 


MAEERP 




10 


linker 


3 


TGEKP 




11 


GaS-HA-tag 


1989 


LRQKDGGGSYPYDVPDYA* 





Predicted binding site. * indicates a translation stop codon. 



Table 3. Nomenclature, amino acid sequences and known or predicted binding sequences 
("SITE**) of zinc finger modules and other peptide units used in library construction. 

1 5 a. Human Zinc Finger Mini-Library Construction. 

Two libraries are prepared, according to the scheme shown in Figure 2. The N-terminal 
finger of the 3-finger construct is referred to as 'cassette A', The central finger is 
encoded by cassette and the third (C-terminal) finger module is called cassette C. 

20 

Zinc Finger Cassettes 

Polynucleotide sequences encoding the amino acid sequmces of the 8 zinc finger 
modules shown in Table 3 are detemuned, taking into account E, coli codon preferences. 
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and the corresponding nucleotide sequences are synthesised as single stranded 
oligonucleotides, examples of which are shown in Table 4. Also shown are the sequences 
of exemplary linkers and an exemplary 3 '-tag required for the assembly of 3-finger 
domains. Double stranded cassettes encoding the zinc finger modules and relevant 
5 leader, linlcer, and terminator sequences are generated by PGR according to the procedure 
described below, using the appropriate oligonucleotide templates of Table 4, and primers 
of Table 5. 





CODE 


FTNOER 


SEO ID 


IVO NTJCLEOTIDK SEOUENCE 


1 


AS144 


ZIF268 Fl 


1990 


TATGCGTGCCCGGTGGAAAGCTGCGATCGTCGTTTTAG 
CCGTAGCGATGAACTGACCCGTCATATTCGTATTCAT 


2 


ASUS 


ZIF268 F2 


1991 


TTTCAGTGCCGTATTTGCATGCGTAACTTTAGCCGTAG 
CGATCATCTGAGCACCCATATTCGTACCCAT 


3 


AS148 


SplF3 


1992 


TTTAGCTGCCCGATTTGCGAAAAACGTTTTATGCGTAG 
CGATCATCTGACCAAACATGCGCGTCGTCAT 


4 


AS149 


WTl Fl 


1993 


TTTATGTGCGCGTATCCGGGCTGCAACAAACGTTATTT 
TAAACTGAGCCATCTGCAGatgCATAGCCGTAAACAT 


5 


AS 150 


OI5391 


1994 


TTTGTGTGCCCGTTTGATGTGTGCAACCGTAAATTTGC 
GCAGAGCACCAACCTGAAAACCCATATTCTGACCCAT 


6 


AS151 


075626 


1995 


TTTAAATGCCAGACCTGCAACAAAGGCTTTACCCAGCT 
GGCGCATCTGCAGAAACATTATCTGGTGCAT 


7 


AS152 


ZN45- 
AAA 


1996 


TATAAATGCGAAGAATGCGGCAAAGGCTTTAG.GCAGGC 
GAGCAACCTGCTGGCGCATCAGCGTGGCCAT 


8 


AS153 


Z165-GCC 


1997 


TATGAATGCAACGAATGCGGCAAAAGCTTTGCGGAAAG 
CAGCGATCTGACCCGTCATCGTCGTATTCAT 


9 




MAEERP 
leader 


1998 


ATGGCGGAAGAACGTCCG 


10 




TGEKP 
linker 


1999 


ACCGGCGAAAAACCG 


11 




G3S-HA- 
tag(tag) 


2000 


CATCTGCGCCAGAAGGACGGCGGCGGCAGCTATCCGTA 
TGATGTGCCGGATTATGCGTAA 



10 Table 4« Nucleotide sequences encoding zinc finger modules and other peptide sequences 
used in the construction of 3-finger proteins. 



X 


CODE 


NAME 


SEQ ID NO 


SEQUENCE 


1 


ASS 


pETFwdl 


2001 


CGCTGACTTCCGCGTTTCC 


2 


AS86 


SDRev 


2002 


ATGTATATCTCCTTCTTAAAGTT 


3 


AS93 


ZnFlFwd 


2003 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTATGCGTGCCCGGTGGAAAG 


4 


AS94 


ZnF2Fwd 


2004 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTCAGTGCCGTATTTGCATG 
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5 


AS95 


ZnFSFwd 


2005 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 

CGTCCGTTTAGCTGCCCGATTTGCG 


6 


AS96 


ZnF4Fwd 


2006 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTATGTGCGCGTATCCGGG 


7 


AS97 


ZnFSFwd 


2007 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTATGTGCGCGTATCCGGG 


8 


AS98 


ZnF6Fwd 


2008 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTAAATGCCAGACCTGCAAC 


9 


AS99 


ZnF7Fwd 


2009 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTATAAATGCGAAGAATGCGGC 


10 


ASlOO 


ZnFSFwd 


2010 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 

CGTCCGTATGAATGCAACGAATGCGGC 


11 


ASlOl 


ILinklRev 


2011 


CGGTTTTTCGCCGGTATGAATACGAATATGACGGG 


12 


AS102 


lLiiik2Rev 


2012 


CGGTTTTTCGCCGGTATGGGTACGAATATGGGTGC 


13 


AS103 


lLmk3Rev 


2013 


CGGTTTTTCGCCGGTATGACGACGCGCATGTTTGG 


14 


AS104 


lLiiik4Rev 


2014 


CGGTTTTTCGCCGGTATGTTTACGGCTATGCATCT 
G 


15 


AS105 


ILinkSRev 


2015 


CGGTTTTTCGCCGGTATGGGTCAGAATATGGGTTT 

TC 


16 


AS106 


lLink6Rev 


2016 


CGGTTTTTCGCCGGTATGCACCAGATAATGTTTCT 
GO 


17 


AS107 


lLink7Rev 


2017 


CGGTTTTTCGCCGGTATGGCCACGCTGATGCGC 


18 


AS 108 


ILinkSRev 


2018 


CGGTTTTTCGCCGGTATGAATACGACGATGACGGG 


19 


AS 109 


ILinklFwd 


2019 


CATACCGGCGAAAAACCGTATGCGTGCCCGGTGGA 
AAG 


10 


ASUO 


lLitiIc2Fwd 


2020 


CATACCGGCGAAAAACCGTTTCAGTGCCGTATTTG 
CATG 


11 


ASlll 


ILinkSFwd 


2021 


CATACCGGCGAAAAACCGTTTAGCTGCCCGATTTG 

CG 


12 


AS112 


lLinlc4Fwd 


2022 


CATACCGGCGAAAAACCGTTTATGTGCGCGTATCC 
GGG 


13 


AS113 


ILinkSFwd 


2023 


CATACCGGCGA/y^AACCGTTTGTGTGCCCGTTTGA 
TGTG 


14 


AS114 


lLinlc6Fwd 


2024 


CATACCGGCGAAAAACCGTTTAAATGCCAGACCTG 
CAAC 


15 


AS115 


ILinkTFwd 


2025 


CATACCGGCGAAAAACCGTATAAATGCGAAGAATG 
CGGC 


16 


AS116 


ILinkSFwd 


2026 


CATACCGGCGAAAAACCGTATGAATGCAACGAATG 

CGGC 


17 


ASU7 


2LinklRev 


2027 


TGGCTTCTCACCCGTGTGATGAATACGAATATGAC 
GGGTC 


18 


AS118 


2Link2Rev 


2026 


TGGCTTCTCACCCGTGTGATGGGTACGAATATGGG 

TGC 


19 


AS119 


2Link3Rev 


2029 


TGGCTTCTCACCCGTGTGATGACGACGCGCATGTT 
TGG 


20 


AS120 


2Link4Rev 


2030 


TGGCTTCTCACCCGTGTGATGTTTACGGCTATGCA 
TCTG 
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21 


AS121 


2Link5Rev 


2031 


TGGCTTCTCACCCGTGTGATGGGTCAGAATATGGG 
TTTTC 


22 


AS 122 


2Link6Rev 


2032 


TGGCTTCTCACCCGTGTGATGCACCAGATAATGTT 
TCTGC 


23 


AS 123 


2Link7Rev 


2033 


TGGCTTCTCACCCGTGTGATGGCCACGCTGATGCG 

C 


24 


AS 124 


2Link8Rev 


2034 


TGGCTTCTCACCCGTGTGATGAATACGACGATGAC 
GGG 


25 


AS 125 


2LiiiklFwd 


2035 


CACGGGTGAGAAGCCATATGCGTGCCCGGTGGAAA 
G 


26 


AS 126 


2Link2Fwd 


2036 


CACGGGTGAGAAGCCATTTCAGTGCCGTATTTGCA 

TG 


27 


AS 127 


2Link3Fwd 


2037 


CACGGGTGAGAAGCCATTTAGCTGCCCGATTTGCG 


28 


AS128 


2Liiik:4Fwd 


2038 


CACGGGTGAGAAGCCATTTATGTGCGCGTATCCGG 
G 


29 


AS 129 


2Liiik5Fwd 


2039 


CACGGGTGAGAAGCCATTTGTGTGCCCGTTTGATG 
TG 


30 


AS130 


2Link6Fwd 


2040 


CACGGGTGAGAAGCCATTTAAATGCCAGACCTGCA 

AC 


31 


AS131 


2Link7Fwd 


2041 


CACGGGTGAGAAGCCATATAAATGCGAAGAATGCG 
GC 


32 


AS132 


2Liiik8Fwd 


2042 


CACGGGTGAGAAGCCATATGAATGCAACGAATGCG 

GC 


33 


AS133 


3HAlRev 


2043 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
AATACGAATATGACGGGTC 


34 


AS134 


3HA2Rev 


2044 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
GGTACGAATATGGGTGC 


35 


AS135 


3HA3Rev 


2045 


CTAGGAATTCTTACGCATAATCCGGCAGATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
ACGACGCGCATGTTTGG 


36 


AS136 


3HA4Rev 


2046 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
TTTACGGCTATGCATCTG 


37 


AS 137 


3HA5Rev 


2047 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
GGTCAGAATATGGGTTTTC 


38 


AS138 


3HA6Rev 


2048 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
CACCAGATAATGTTTCTGC 


39 


AS139 


3HA7Rev 


2049 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
GCCACGCTGATGCGC 


40 


AS140 


3HA8Rev 


2050 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
AATACGACGATGACGGG 
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41 


AS 141 


Rev3 


2051 


CTAGGAATTCTTACGCATAATC 


42 


AS142 


ILinkRev 


2052 


CGGTTTTTCGCCGGTATG 


43 


ASMS 


2LinkRev 


2053 


TGGCTTCTCACCCGTGTG 



Table 5. Modifying oligonucleotides used for mini-library construction. 

5 /. Library 1. 

Once made into double stnmded DNA cassettes, the finger units are attached to T7 
upstream expression sequences by PGR overlap extmsion, using the following protocol. 

10 (a) Upstream sequences are first extracted from pET23a by PCR using primers 

pETFwdl and SDRev, generatmg the fi-agment pET5\ 

(b) The fingers for cassette A are amplified with forward primers ZnFxFwd 
(AS93-100) aad reverse primers ILinkxRev (AS101-AS108), where x is the number of a 

15 particular finger from Tables 3 and 4, as indicated. 

(c) The fingers for cassette B are amplified with forward primers ILinkxFwd 
(AS109-1 16) and reverse primers 2LiiikxRev (ASl 17-AS124), where x refers to the 
finger module number. 

20 

(d) The fingers for cassette C are amplified with forward primers 2LinkxFwd 
(AS125-.132) and reverse primers 3HAxRev (AS133-AS140), where x refers to the 
appropriate zinc finger module. 

25 The steps to create cassettes A, B and C are performed separately, however, mixed 

populations of template ohgonucleotides can be added to each PCR of steps (a), (b), and 
(c) tb produce a libraiy of each cassette. 

The final 3-finger Ubrary is assembled by overlap extension as outlined in Figure 2. In 
30 the first step the mixed pool of cassette A is appended to the upstream sequences, pET5\ 
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Eqxiimolar amounts are mixed and PCR-cycled in the absence of primers. The reaction 
product is either purified immediately or reampUfied before purification using primers 
pETFwdl and ILinkRev. 

5 In the second step cassette B (mixed pool) is appended to the product of the above step. 
Again, equimolar amounts are mixed and PCR-cycled in the absence of primers. The 
reaction product is eitlier purified inmiediately or reampUfied before purification using 
primers pETFwdl and 2LinkRev, 

10 In the final step cassette C (mixed pool) is appended to the above product. Equimolar 
amounts are mixed and PCR-cycled in the absence of primers. As before, the reaction 
product may be purified immediately or reampUfied before purification using primers 
pETFwdl and Rev3. (see, also Figure 2). 

15 2. Library!. 

Library 2 is assembled in a similar manner to Library 1 except that cassette A is 
represented by Zi£268 finger 1 only. 

20 The fibial PGR products containing T7 promoter sequences and encoding 3-finger 
peptides attached to an HA-antibody tag are purified and used for the production of 
protein. 

25 b. Zinc Finger Library Screening. 

Two exemplary methods for screening zinc finger Ubraries, such as those produced 
above, are described in Protocol A and Protocol B, below. 
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ProtocolA: 

The peptides of library 1 and library 2 are screened to select 3-zinc finger domains which 
bind the sequences: 5'-GCG-TGG-GCG-3'; 5'>GGA-TAA-GCG-3'; and5'-GCC-GAG- 

5 TGG-3'. Since library 2 contains Zif268 finger 1 in the N-tenninal position, in theory, 
these peptides should only bind tlie sequences, 5'~GCG-TGG-GCG-3', and 5'-GGA- 
TAA-GCG-3 Hence, library 2 is effectively used to select 2-finger units which bind 
strongest to the 6 bp sequences, 5'-GCG-TGG-3', and 5'-GGA-TAA-.3\ Double 
stranded bindiag sites for use in the selection protocol are generated by annealing the 

1 0 complimentary oligonucleotides: Zif.b site and Zif site RC (AS 1 54 and AS 155); 

#l#5#6.b and #1#5#6 RC (AS156 and AS157); and #2#4#8.b and #2#4#8 RC (AS158 
and AS 159). The top strand of each binding site is biotinylated, allowing capture of 
binding site/zinc finger/HA-antibody ternary complexes to the streptavidin-coated plate in 
an ELISA screening assay. The oligonucleotides are displayed in Table 6, below. 

15 



X 


Code 


Name 


SEQ ID NO 


Sequence 


I 


AS154 


Zif.b site 


2054 


TTTTTTTTTTGCGTGGGCGTTTTTTTTTT 


2 


AS 155 


Zif site RC 


2055 


AAAAAAAAAACGCCCACGCAAAAAAAAAA 


3 


AS 156 


#l#5#6.b 


2056 


TTTTTTTTTTGGATAAGCGTTTTTTTTTT 


4 


AS157 


#1#5#6RC 


2057 


AAAAAAAAAACGCTTATCCAAAAAAT^T^ 


5 


AS158 


#2#4#8.b 


2058 


TTTTTTTTTGCCTGTTGGTTTTTTTTTTT 


6 


AS159 


#2#4#8RC 


2059 


AAAAAAAAAAACCAACAGGCAAAAAAAAA 



Table 6. Oligonucleotide sequences used to generate double stranded binding sites used 
in the selection procedure. 

20 

The PCR-ampHfied 3-finger constructs are gel-purified fix)m a 1% TAE-agarose gel using 
the Gel Extraction Kit (Qiagen) and quantified based on absorbance at 260 nM. Dilutions 
(in 0.25 mg/ml X DNA) of DNA template encoding for either library 1 or 2 are prepared 
25 at the final total template concentration of 42 fM and 1 fM, respectively. At these 

concentrations 1 p.1 of template contains approximately 2500 and 600 molecules of library 
1 or library 2, respectively. At such low concentrations, such samples must be PCR 
amplified to generate enough template for protein expression. Hence, these 1 \xl aUquots 
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are taken and added to 1 ml PCR pre-mix, containing primers Rev3 (AS 141) and 
pETFwd2 (primer sequences shown below, see Table 7). The PCR pre-mixes are then 
aliquoted into 96 (or 384) well plates at 10 ^il per well, which is the equivalent of . 
approximately 25 or 6 molecules of library 1 or library 2 template, respectively. 
5 Templates are araphfied using 30 cycles of PCR. After this first round of PCR, 0.5 
aliquots of PCR product are added to new 10 |li1 PCR pre-mixes (in 96 or 384 well 
format), containing nested primers, pETFwd3 and Rev3, and amplified for another 30 
cycles. The resultant product is concentrated enough to perform in vitro transcription / 
translation. 

10 

In vitro translation experiments using TNT PCR coupled transcription-translation mix 
(Promega) are assembled according to the manufacturer's instructions. Typically 5 |il 
final volume contains 1 |li1 of each PCR product and 4 \il rabbit reticuloc3rte pre-mix 
(containing 20 jiM methionine, 12.5 jig/ml X Hind UL digest (Roche), 500 ZnCla 

15 (Sigma), 0.7 jil H20, 40 nM PCR-amplified DNA template). Reactions are incubated at 
30**C for 90 minutes. 50 \xl PBS binding buffer containing 0.1 % BSA (Sigma), 0.5% 
Tween 20 (Sigma), 50 pM ZnCla, 10 nM of the appropriate biotinylated bindijag site, 25 
(lU/ml rat 3F10 anti-HA HRP conjugate (Roche) is added to the translation mix and 
incubated for 45 minutes at room temperature. The binding nodx is thereafter transferred 

20 to pre-blocked black streptavidin-coated 8- well strips or 96 / 384 well plates (Roche), and 
the ternary complexes containing 3-finger peptide, biotinylated binding site and anti-HA 
HRP antibody are captured while shaking at 200 rpm for 45 minutes at room temperature. 
The wells are then washed five times with 100 p,l PBS binding buffer containing 0.1 % 
BSA (Sigma), 0.5% Tween 20 (Sigma), 50 \iM ZnCh to remove unbound components. 

25 Finally, the retained HRP activity is measured by adding 50 nl QuantaBlu fluorogenic 
HRP substrate (Pierce). Figure 3 demonstrates the capture and detection of target site- 
binding zinc finger peptides using the assay described. Fluorescence is measured on a 
SpectraMax Gemini XS (Molecular Devices) fluorescence microplate reader at 320 nm 
excitation, 433 nm emission and 420 nm cut-off values. 

30 

The wells that give the highest levels of fluorescence are those which contain the highest 
number of; or tightest binding 3-finger peptides. PCR products fi:om the second PCR 
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amplification stage, corresponding to such samples, are purified fi^om TAE-agarose gels 
and quantified, as above. Pure PGR products are diluted to approximately 50 molecules 
per lal (which is equivalent to approximately 100 aM concentration) in 0.25 mg/ml X 
DNA. As above, 1 [xl samples of template are added to 1 ml PGR pre-mix containing 
5 primers, pETFwd4 and Rev3. 1 0 |li1 aliquots are placed in each well of a 96 well plate. 
At this stage, there is (on average) 0.5 template molecules per aUquot. Therefore, 
generally speaking, half of the samples will contain no template and half will contain a 
single template molecule. Samples are then PGR ampUfied using 30 cycles. Again, 0.5 
^1 PGR samples are taken firom each well and amplified again by 30 cycles of PGR using 
10 the nested primers^ pETFwdS and Rev3. 1 fil of each of these PGR products is used for 
protein expression, as described above. At this stage, the highest levels of fluorescence 
correspond to the samples containing the tightest binding 3-finger peptides. The PGR 
product encoding such peptides is purified, as before, and can be sequenced to determine 
the protein sequence of the optimal 3-zinc finger domain for the appropriate binding site. 

15 

If fiurther rounds of selection are required, PGR arapUfication can be conducted with the 
nested primers pETFwd6, pETFwd9 and pETFwd?, also shown below (Table 7). 



NAME 


SEQ ID NO 


SEQUENCE 


pETFwdl 


2060 


CGCTGACTTCCGCGTTTCC 


pETFwd2 


2061 


TCCAGACTTTACGAAACACGG 


pETFwd3 


2062 


CGAAGACCATTCATGTTGTTGC 


pETFwd4 


2063 


GTCGCAGACGTTTTGCAGC 


pETFwd5 


2064 


GCAGTCGCTTCACGTTCGC 


pETFwd6 


2065 


CGCTCGCGTATCGGTGATTC 


pETFwd9 


2066 


CATTCTGCTAACCAGTAAGGC 


pETFwd? 


2067 


GCCTAGCCGGGTCCTCAAC 



20 Table 7: Primers used for PGR amplification of 3-finger cassettes (as constructed by the 
procedure of Figure 2) to provide template used in screening zinc finger libraries. 
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Protocol B: 

The peptides of library 2 were screened to select 3-zinc finger domains which bind the 
5 sequences: 5'-GCG-TGG-GCG-3', and 5'-GGG-AGG-CCT-3\ Double stranded binding 
sites for use in the selection protocol were generated by annealing the complementary 
oUgonucleotides: Zifb site and Zif site RC (AS 154 and AS 15 5, shown above), which 
generated the 5'-GCG-TGG-GCG-3' binding site; and the oUgonucleotides 5*- 
TTTTTTTTTTGGGAGGCCTTTTTTT^^ (SEQ ID NO:2123) and 5'- 
10 AAAAAAAAAAAGGCCTCCCAAAAAAAAAA-3' (SEQ ID NO:2124), which 

generated the 5'-GGG-AGG"CCT-3' binding site. The top strand of each binding site 
. was biotinylated, allowing capture of binding site/zinc finger/HA-antibody ternary 
complexes onto streptavidin-coated plate in an ELISA screening assay. 

1 5 The 3-finger library 2 constructs were cloned into the multiple cloning site of vector 

pET23a (Novagen), using appropriate restriction sites. This library was then transformed 
into E.coli and plated out to grow single colonies. 384 colonies (which should represent 
the vast majority of the 64 member library) were picked into 2xYT media with ampicillin 
and cultures grown at 37^C overnight. Library 2 expression cassettes were recovered 

20 from bacteria by PGR using primers pETFwdx (where x is 1-7, eg pETFwdl) and Rev3 
as described in Protocol A above. 

In vitro coupled transcription / translation of PGR products was conducted as described 
above, with the difference that each of the 384 zinc finger peptides was screened 

25 individually in a well of a 384 well plate. The library was screened against the S'-GCG- 
TGG-GCG-3\ and S'-GGG-AGG-CCTO' binding sites, as detailed in Protocol A. Wells 
that yielded the highest levels of fluorescence were those which contain the tightest 
binding 3-finger peptides. The ELISA results firom the screen of the 384 samples against 
the 5'-GCG-TGG-GCG-3' site are shown in Figure 4. Six constructs displayed 

30 significant bmding to the target site and these are termed C8, G16, 119, 123, J19 and K19 
according to their coordinates on the 384-well plate. Similarly, one construct (BIO) 
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showed strong binding to the 5'-GGG-AGG-CCT-3' target site. PGR products encoding 
the tightest binding peptides can be purified, as described supra, and sequenced. 

5 Some of the selected constructs: C8, J19, K19, 123, G16 (which bind the 5'-GCG-TGG- 
GCG-3' site) and BIO (which binds the 5'-GGG-AGG-CCT-3' site), were selected and 
screened against a range of different binding sites to test their specificity. The sites used 
were: 5'-GCG-TGG-GCG-3'; 5'-CCA-CTC-GGC-3'; 5'-CCT-AGG-GGG-3'; 5'-GGA- 
TAA-GCG-3'; 5'-GGG-AGG-CCT-3'; 5'-GCG-TAA-GGA-3'; and S'-GCG-^GGG- 
10 GGA-3'. The binding assay was conducted as described above. The results (Figure 5) 
show that the selected 3-zinc finger peptides bind preferentially to their target site, in 
comparison to the alternative binding sites tested. 

Example 5: Human Zinc Finger Module Libraries for Rapid Selection of 2-Fiiiger 
15 Units. 

The preferred subunits of a poly-zinc finger construction strategy are in the form of two- 
finger sub-domains. Assuming that there are 1,000 individual natural finger modules, a 
library of all combinations of such zinc finger modules, in 2-finger units, would contain 

2p 1 ,000,000 members. All of the 1 ,000 natural finger modules would have to be made ftom 
oligonucleotides, and the expense would be considerable. Furthermore, Oris figure is 
likely to be an underestimate of the number of natural fingers. Hence, due to the huge 
numbers of natural, human zinc finger modules available, it is advantageous to limit the 
size of the Ubraries screened, as discussed in the Description, One way in which library 

25 size can be reduced is to limit the library members to zinc finger modules which are 

predicted to bind the desired sequence. For uistance, based on tiie target sites in Example 
1, if 2-finger domains are required to bind the sequence 5'-GCG-TGG-3', an individual 
Ubrary can be constructed fit)m the zinc finger modules predicted to bind the sequences 
5'-GCG-3' and 5'-TGG-3'. Equally, if the sequence 5'-GGA-TAA-3' is to be targeted, 

30 zinc finger modules predicted to bind the sequences and 5'-GGA-3' and 5*-TAA-3 ' can 
be used. Table 8 shows the natural, human zinc finger modules firom Example 1, which 
are predicted to bind the aforementioned 3 bp sequences. 
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5'-GCG-3' 


5'-TGG-3' 


5'-GGA-3' 


5»-TAA-3' 


Zif268 finger 1 (GCG) 


Zi£268 finger 2 (TGG) 


BCL6 (NGA) 


TYYl (NAA) 


Zif268 finger 3 (GCG) 


MAZ finger 2 (TGG) 


075626 (GGA) 


01539l(YAA) 


Spl finger 2 (GCG) 


WTl finger 3 (TGG) 


ZN45 (N%A) 


075626 (YAA) 


WTl finger 4 (GCG) 


SP4(NGG) 


015535 (GNA) 


ZN45 (N%A) 


BTBl (GCG) 


BTEl (NGG) 


Q 15776 (GNA) 


Z136(TNN) 


043296 (GNG) 


Z136(TNN) 


060893 (GNA) 


Z239 (YAA) 


Z174(GCG,RNA) 


Q15776(NGG) 


Z132(a)(GGA) 


Q15776(a)(TNA) 


Z202(GCG,RNA) 


ZN84(YGG) 


Z132 (b) (GGA) 


Q15776(b)CrNA) 






Z132 (GGN) 


Z195 (YAA) 






ZN85 (GGA) 


ZN84 (YAA) 








075346 (TAA) 








ZN43 (TAA) 



Table 8. The natural, human zinc finger modules predicted to bind the sequences 5'- 
GCG-3', 5'-TGG-3*, 5'-GGA-3* and5^TAA-3'. 

. 5 

On the basis of the specificities shown in Table 5, a library of 2-finger units to target the 6 
bp sequence 5'-GCG-TGG-3' has 64 (8x8) members, and a hbrary to target the sequence 
5'-GGA-TAA-3' has 120 (10x12) members. To screen sample sizes of this magnitude 

10 we can construct each 2-finger unit specifically (using for example, an 8x8 or 10x12 
matrix arrangement), and assay the samples containing individual clones using the 
fluorescent-ELISA protocol of Example 4. Such a procedure can save time in 
comparison to constructing all possible 64 or 120 variants in a random fashion (as a 
library), as described in Example 4, because the number of constructs screened would 

15 have to be considerably higher. 

a. Construction of 2-Finger Domams to Bind 5'-GCG-TGG-3' 



20 



A 64 member, 2-finger library is constructed from the natural, human zinc finger modules 
predicted to bind the sequences 5'-GCG-3' and 5'-TGG-3* (Table 8, columns 1 and 2). 
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The 2-finger library units are all attached to the C-terminus of ZiQ68 finger 1, which acts 
as an anchor finger. The construction protocol is different firom that described in 
Example 4, as described below. 

5 Zinc Finger Cassettes 

Nucleotide sequences encoding the amino acid sequences of the 16 zinc finger modules 
(Table 8, columns 1 and 2) are determined, taking into account human codon preferences, 
and the corresponding nucleotide sequences are synthesised as single stranded 
10 oHgonucleotides, shown in Table 9. Double stranded cassettes encoding the zinc finger 
modules and flanking linker sequences are generated by PGR using the appropriate 
primers, shown in Table 10. 



X 


FINGER 


SEQID 
NO 


NUCLEOTIDE SEQUENCE 


1 


Zi£268Fl 


2068 


TACGCCTGCCCCGTGGAGAGCTGCGACCGCCGCTTCAG 
CCGCAGCGACGAGCTGACCCGCCACATCCGCATCCAC 


2 


Zif268 F3 


2069 


TTCGCCTGCGACATCTGCGGCCGCAAGTTCGCCCGCAG 

CGACGAGCGCAAGCGCCACACCAAGATCCAC 


3 


SplF2 


2070 


TTCGCCTGCAGCTGGCAGGACTGCAACAAGAAGTTCGC 
CCGCAGCGACGAGCTGGCCCGCCACTACCGCACCCAC 


4 


WTl F4 


2071 


TTCAGCTGCCGCTGGCCCAGCTGCCAGAAGAAGTTCGC 
CCGCAGCGACGAGCTGGTGCGCCACCACAACATGCAC 


5 


BTEl 


2072 


TTCCCCTGCACCTGGCCCGACTGCCTGAAGAAGTTCAG 
CCGCAGCGACGAGCTGACCCGCCACTACCGCACCCAC 


6 


043296 


2073 


TACGAGTGCGTGGAGTGCGGCAAGGCCTTCACCCGCAT 
GAGCGGCCTGACCCGCCACAAGCGCATCCAC 


7 


Z174 


2074 


TACAAGTGCGACGACTGCGGCAAGAGCTTCACCTGGAA 
CAGCGAGCTGAAGCGCCACAAGCGCGTGCAC 


8 


Z202, 


2075 


TACCGCTGCGACGACTGCGGCAAGCACTTCCGCTGGAC 
CAGCGACCTGGTGCGCCACCAGCGCACCCAC 


9 


Zif268F2 


2076 


TTCCAGTGCCGCATCTGCATGCGCAACTTCAGCCGCAG 
CGACCACCTGAGCACCCACATCCGCACCCAC 


10 


MAZF2 


2077 


TACAACTGCAGCCACTGCGGCAAGAGCTTCAGCCGCCC 
CGACCACCTGAACAGCCACGTGCGCCAGGTGCAC 


11 


WT1F3 


2078 


TTCCAGTGCAAGACCTGCCAGCGCAAGTTCAGCCGCAG 
CGACCACCTGAAGACCCACACCCGCACCCAC 


12 


Sp4 


2079 


CACAAGTGCCCCTACAGCGGCTGCGGCAAGGTGTACGG 
CAAGAGCAGCCACCTGAAGGCCCACTACCGCGTGCAC 


13 


BTEl 


2080 


CACAAGTGCCCCTACAGCGGCTGCGGCAAGGTGTACGG 
CAAGAGCAGCCACCTGAAGGCCCACTACCGCGTGCAC 
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14 


Z136 


2081 


TTCGAGTGCAAGCGCTGCGGCAAGGCCTTCCGCAGCAG 
CAGCAGCTTCCGCCTGCACGAGCGCACCCAC 


15 


Q15776 


2082 


TACGAGTGCGACGAGTGCGGCAAGACCTTCCGCCGCAG 
CAGCCACCTGATCGGCCACCAGCGCAGCCAC 


16 


ZN84 


2083 


TACGAGTGCGGCGAGTGCGGCAAGGCCTTCAGCCGCAA 
GAGCCACCTGATCAGCCACTGGCGCACCCAC 




Binding. 



Table 9. Nucleotide sequences of zinc finger modules and nucleotide sequences encoding 
other peptide sequences used in the construction of peptides to bind the sequence 5'- 
5 GCG-TGG-3'. 

The primers used to amplify the N-tennnial finger of the pair (the equivalent of cassette 
B, above) add TGEKP (SBQ ID N0:3) linker sequences, and the restriction siteXmal (5'- 

10 CCC-GGG-3') at the 5' end and an Agel site (5'-ACC-GGT-3') at the 3' end. Agel and 
Xmal create compatible ends, but have unique restriction sites. These primers are called 
CasBxFwd and CasBxRev, respectively, where x refers to the number of the zinc finger 
module in Table 9. The primers used to amplify the C-terminal finger of the pair (the 
equivalent of cassette C, above) add TGEKP (SEQ ID N0:3) linker sequences, and the 

15 restriction site JTmal at the 5' end and a sequence encoding LRQKDGGGS (SEQ ID 
NO:2125), containing a restriction site for BamSl at the 3* end. These primers are 
referred to as CasCxFwd and CasCxRev, respectively. The 16 individual zinc finger 
cassettes are then purified using the QIAquick PGR purification kit (Qiagen). 



Name 


SEQ ID NO 


Sequence 


CasB9Fwd 


2084 


GATCCCCGGGGAGAAGCCCTTCCAGTGCCGCATCTGCAT 


CasBlOFwd 


2085 


GATCCCCGGGGAGAAGCCCTACAACTGCAGCGACTGCGG 


CasBllFwd 


2086 


GATCCCCGGGGAGAAGCCCTTCCA6TGCAAGACCTGCCA 


CasB12Fwd 


2087 


GATCCCCGGGGAGAAGCCCCACAAGTGCCCCTACAGCG 


CasB13Fwd 


2088 


GATCCCCGGGGAGAAGCCCCACAAGTGCCCCTACAGCG 


CasBMFwd 


2089 


GATCCCCGGGGAGAAGCCCTTCGAGTGCAAGCGCTGCG 


CasBlSFwd 


2090 


GATCCCCGGGGAGAAGCCCTACGAGTGCGACGAGTGCG 


CasB16Fwd 


2091 


GATCCCCGGGGAGAAGCCCTACGAGTGCGGCGAGTGCG 


CasClFwd 


2092 


GATCCCCGGGGAGAAGCCCTACGCCTGCCCCGTGGAG 
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CasC2Fwd 


2093 


GATCCCCGGGGAGAAGCCCTTCGCCTGCGACATCTGCG 


CasC3Fwd 


2094 


GATCCCCGGGGAGAAGCCCTTCGCCTGCAGCTGGCAGG 


CasCMFwd 


2095 


GATCCCCGGGGAGAAGCCCTTCAGCTGCCGCTGGCCC 


CasCSFwd 


2096 


GATCCCCGGGGAGAAGCCCTTCCCCTGCACCTGGCCC 


CasC6Fwd 


2097 


GATCCCCGGGGAGAAGCCCTACGAGTGC6TGGAGTGCG 


CasC7Fwd 


2098 


GATCCCCGGGGAGAAGCCCTACAAGTGCGACGACTGCGG 


CasCSFwd 


2099 


GATCCCCGGGGAGAAGCCCTACCGCTGCGACGACTGCG 


CasB9Rev 


2100 


CTTCTCACCGGTGTGGGTGCGGATGTGGGTG 


CasBlORev 


2101 


CTTCTCACCGGTGTGCACCTGGCGCACGTG 


CasBllRev 


2102 


CTTCTCACCGGTOTGGGTGCGGGTGTGGGT 


CasB12Rev 


2103 


CTTCTCACCGGTGTGCACGCGGTAGTGGGC 


CasB13Rev 


2104 


CTTCTCACCGGTGTGCACGCGGTAGTGGGC 


CasBMRev 


2105 


CTTCTCACCGGT6TGGGTGCGCTCGTGCAG 


CasBlSRev 


2106 


CTTCTCACCGGTGTGGCTGCGCTGGTGGCC 


CasB16Rev 


2107 


CTTCTCACCGGT6TGGGTGCGCCAGTGGCT 


CasClRev 


2108 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATGC 
GGATGTGGCGG 


CasC2Rev 


2109 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATCT 
TGGTGTGGCGC 


CasCSRev 


2110 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GGTAGTGGCG 


CasC4Rev 


2111 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGCATGT 
TGTGGTGGCGC 


CasCSRev 


2112 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GGTAGTGGCG 


CasC6Rev 


2113 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATGC 
GCTTGTGGCGG 


CasC7Rev 


2114 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGCACGC 
GCTTGTGGCG 


CasCSRev 


2115 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
j GCTGGTGGCG 
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ScaRev 


2116 


GTCATGCCATCCGTAAGATGC 


GSFwd 


2117 


GGCGGATCCTATCCGTATGATGTG 


ZiflFwd 


2118 


AGAGAGAGAGAGATCTATGGCGGAAGAACGTCCGTATGC 
GTGCCCGGTGGAAAG 


ZiflRev 


2119 


AGCCGGATCCCAAACACCGGTATGAATACGAATATGACG 
GG 


pETRevl 


2120 


AGTGTAGCGGTCACGCTGC 



Table 10- Oligonucleotides used for PGR construction of rapid zinc finger library. 
Annealing sequences are shown in bold, restriction sites are underlined. 

5 3-Finger Library Peptides 

The 2 natural zuic finger modules for each construct are appended to the C-terminus of 
Zif268 finger 1 (as in Example 4, library 2). Hence, a plasmid construct containing 
Zif268 finger 1 and appropriate restriction sites for cloning of the two natural finger 
10 modules is also prepared. The construction and cloning procedure for the 3-finger 
libraries follows (see also Figure 6). 

(a) The plasmid pET23a/TZF-HA was assembled by PGR amplification of 
plasmid pTFZ-KOX (described in co-owned WO 01/53480) with primers ASl and AS2. 

15 The sequences of these primers are as follows: 

ASl: CGATGGATCCATGGGAGAGAAGGCGCTGC (SEQIDNO:2126) 
AS2: GCGTAAAGCTTACGCATAATCCGGCACATCATACGGATAAGAG 

CCGCCGCCGTCCTTCTGTCTTAAATGGATTT (SEQ ID NO:2127) 
The PGR product was gel purified and digested with BamHI and Hindlll, then 
20 repurified and cloned into BamH I/Hind IH-digested pET23a vector (Novagen), yielding 
pET23a/TFZ-HA. A number of clones were picked and sequenced to verify the 
correctness of the inserts. 

(b) A fi-agment of approximately 1.2 kb is amplified firom the vector 

25 pET23a/TFZ-HA, using the primers ScaRev and GSFwd (Table 10). This firagment 
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contains the HA-epitope tag sequence (YPYDVPDYA* (SEQ ID NO: 2122)) and part of 
the GGGS (SEQ ID NO:1988) linker sequence at the 5' end. Additionally, the GSFwd 
primer adds a BamHI site at the extreme 5' end. The ScaRev primer does not contain a 
restriction site, but a Seal site from the vector is present approximately 40 bp downstream 
5 of the primer binding site. This fragment is cut with BamJIL and iScal and inserted into 
similarly cut pET23a. 

(c) Zif268 finger 1 is then amplified using the PGR primers ZiflFwd and ZiflRev 
(Table 10), which add aBgni site at the 5' end and both ^gel andBamEl sites at the 3' 
10 end. This construct is then cut with BglU. and BamHI and inserted into the vector 

construct made in step (b), which has been linearised with BamHI, At this stage the new 
constmct, termed pET23aZiflHA is sequenced to find correctly oriented zinc finger 
inserts. 

15 (d) Oligonucleotides encoding zinc finger modules for the C-terminus of the 3- 

finger constructs (cassette C) are amplified using the primers CasCxFor and CasCxRev 
(where x is 1 to 8, see Table 10). These cassettes are then digested with the restriction 
enzyme BainHI^ and inserted into BamW. cut, dephosphorylated pET23aZiflHA- At this 
stage the new vector construct is not recircularised. 

20 

(e) Oligonucleotides encoding zinc finger modules for cassette B are amplified 
using primers CasBxFor and CasBxRev (where x is 9 to 16, see Table 10). These 
fragments are cut with the enzymes ^mal and Agel^ at 37 for 1-2 hours. The linear 
vector produced in stage (d) above, is also cut with Agel mdXmal (as described), and 

25 dephosphorylated. Digested cassette B firagments are ligated into Agel, Xmal cut vector, 
in the presence of the restriction enzymes Agel and Xmal at room temperature for 16 
hours. During this incubation incorrectly ligated fragments are re-digested and re-ligated 
repeatedly, until the majority (or all) of the inserts are in the desired orientation. Correct 
3-finger constructs have the assembly depicted in Figure 6. 

30 

(f) Finally, 3-finger constructs are amplified from the ligated vector (produced in 
step (e)) using the primers pETFwdl (Table 5) and pETRevl (Table 10). 1 ^il of each 
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ligation mixture is amplified in a 10 ^1 (total volume) PGR reaction for 30 cycles. 
Alternatively, the ligated vector can be transformed into bacteria to produce samples 
containing single zinc finger clones. 

5 The above procedure results in the majority of PGR products being the correct 3-finger 
constructs, so that any incorrect fragments will not significantly affect the selection 
protocol, and the PGR products can be used for screening without fiirther processing. 
Alternatively, 3-finger PGR products may be purified from an agarose gel before use. 

10 b. Screening of the Library Against 5'-GCG-TGG-GCG-3' 

Members of the zinc finger library can be screened against the desired target site from a 
mixed population of clones, or from individual clones as described in Example 4, 
Protocol A or Protocol B (above), respectively. The target site for the screen is produced 

15 by annealing the oligonucleotides Zif b site (AS154) and Zif site RG (AS155), as before. 
Template for protein expression is in each case made by PGR using primers pETFwdl 
(Table 5) and pETRevl (Table 10). 1 pJ of each PGR reaction is used to express protein 
and screen for binding to the Zif site in the manner described in Example 4. The DNA 
corresponding to the samples giving the highest fluorescence signals is collected, purified 

20 from a 1% TAE-agarose gel, and sequenced to determine the sequence of the optimal 
binding 3-finger peptide. 

Example 6: Reduced Human Zinc Finger Module Library for Universal DNA 
Recognition. 

25 

A Ubrary system similar to that described in Example 5 can be constructed using zinc 
finger modules from databases such as those in Examples 1, 2 and 3 to select 2-finger 
units which bind any 2-finger (6 bp) recognition sequence. There are only 4096 (=4^) 
unique 6 bp sequences, therefore, a 2-finger library of natural zinc fingers (from specific 
30 animals, plants or ftmgi) can easily be constructed with enough variabihty to provide a 
specific 2-finger combination for optimal binding to any 6 bp target site. Again, to 
reduce the number of natural zinc finger modules that have to be constructed, a small 
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selection of natural zinc fibager modules (e.g., 3) are chosen for each 3 bp binding 
sequence (according to their predicted or determined recognition sequence). There are 64 
(=4^) possible 3 bp binding sequences so in the first instance less than 200 (i.e. 192) 
natural zinc finger modules aie constructed. These 200 zinc finger modules can be in 
5 either of 2 possible positions in the 2-finger construct, which gives approximately 40,000 
(=200^) combinations of fingers to bind the 4096 possible 6 bp target sites. As in 
Example 5, these 2- finger units are attached to Zif268 finger 1 which acts as an anchor 
for DNA recognition. 

10 a. Library Construction 

The selected 2dnc finger modules are reverse translated from their amino acid sequences 
and synthesised as oligonucleotides. Double stranded zinc finger cassettes for both N- 
terminal and C-tetminal fingers are created by PCR using primers specific for the relevant 

1 5 zinc fibtiger module. Each zinc finger module is amplified in 2 separate reactions, as 
described in Example 5. The first PCR reaction uses primers which add TGEKP (SEQ 
ID NO:3) linker peptides and Agel and Xmal restriction sites, to the 3' and 5* ends, 
respectively, to generate cassette B fragments. The second PCR reaction generates 
cassette C firagments by adding a TGEKP (SEQ E) N0:3) linker and an JQnal site at ttie 

20 5' end (this primer is the same as that used in cassette B production), and a sequence 

encoding the sequence LRQKDGGGS (SEQ ID NO:2125) and a BantHl restriction site at 
the 3' end. The final constmcts are similgr to that represented in Figure 6. 

b. Library Selection 

25 

The collection of 3-fmger zinc finger peptides produced above can be used to obtain 
specific domains for binding desired target sequences. . Two exemplary approaches are 
described below. 

30 i). Non-Cloning Selections. 
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A library constructed as described herein can be used to select optimal zinc finger 
domains for binding to any specified binding site. For instance, to select a peptide which 
binds the sequence 5*-GGA-TAA-3', the binding site formed by annealing the 
oligonucleotides #l#5#6.b and #1#5#6 RC (Table 6, above), can be used as a target site 
5 (5'-GGA-TAA-GCG-3'). Selection of a zinc finger domain to bind such a target can be 
conducted, for example, in the matmer described in Example 4. Briefly, the zinc finger 
library is diluted into 100 or more sub-libraries, which are screened as described above. 
The most active sub-libraries collected are fiirther diluted to create much smaller sub- 
libraries, which are screened again, and so on. Following such a protocol, a Ubrary of 
1 0 40,000 members can be fiilly screened and a high-affinity binder selected in just 3 rounds. 

This selection procedure provides an extremely rapid method to select zinc finger 
peptides to bind any desired target site. The procedure also has the advantages of 
eliminating the need for cloning (as is required for methods such as phage display, see 
1 5 below), and is not limited by library size. 

ii). Phage Library Selections 

Zinc finger polypeptide phage display libraries are made and used to select clones 
20 encoding peptides that bind the desired nucleotide sequence, as described in co-owned 
WO 98/53057. An exemplary phage display library contains peptides which bind target 
sites v\dth the sequence 5'-XXX-XXX-GCG-3*, where X can be any nucleotide. Hence, 
libraries of phage can be selected using the same target sites as described above. The 
selection protocol for zinc fibngers displayed on phage is briefly described below. 

25 

Protocol 

The selection protocol is adapted firom that described in co-owned international patent 
application WO98/53057. 

30 
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The 3-j5nger constructs of the present Example are PGR amplified using universal 
forward and reverse primers which contain sites for Notl and SfiL respectively (called 
NatPhageF andNatPhageR, respectively). 

5 NatPhageF: GCAACTGC GGCCCAGCCGGCCA TGGCAGAGGAACGCCCGTATG (SEQ ID 
NO:2128) 

NatPhageR: GAGTCATTCTGCGGCCGCGTCCTTCTGGCGCAGGTG (SEQ DD NO:2129) 

Backward PGR primers in addition introduce Met-Ala-Glu as the first three amino acid 
10 residues of the zinc fiuager polypeptides, and these are followed by the residues of the 
wild type or library zinc finger polypeptides as required. Cloning overhangs are 
produced by digestion with SfiL and Notl where necessary. Nucleic acid encoding zinc 
finger polypeptide firagments is ligated into similarly prepared Fd-Tet-SN vector. This is 
a derivative of fd-tet-DOGl (Hoogenboom et al (1991) Nucl Acids Res. 19:4133-4137), 
15 in which a section of the pelB leader and a restriction site for the enzyme SfH (imderlined) 
have been added by site-directed mutagenesis using the oligonucleotide: 

5 » CTCCTGCAGTTGGACCTGTGCCAT GGCCGGCTGGGCCG CATA 
GAATGGAACAACTAAAGC 3' (SEQ ID NO :2 130) 

20 that anneals in the region of the polylinker. Electrocompetent DH5a cells are 

transformed with recombinant vector m 200 ng aliquots, grown for 1 hour in 2xTY 
medium with 1% glucose, and plated on TYE containing 15 fig/ml tetracycline and 1% 
glucose. 

25 To generate phage for selections, tetracycline resistant colonies are transferred firom 

plates into 2xTY medium (16g/litre Bacto tryptone, lOg/litre Bacto yeast extract, 5g/litre 
NaGl) contaiiung 50|iM ZnGb and 15 jig/ml tetracycline, and cultured overnight at 30°C 
in a shaking incubator. Gleared culture supematant containing phage particles is obtained 
by centrifiiging at 300 xg for 5 minutes. 

30 
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Double stranded binding sites for use in selections are generated by annealing 
complementary oligonucleotides, one of which is biotinylated. 

Biotinylated DNA target sites (1 pmol) are bound to streptavidin-coated wells (Roche), 
5 Phage supernatant solutions are diluted 1 : 10 in PBS selection buffer (PBS containing 50 
IxM ZnCl2, 2% Marvel, 1% Tween, 20 jag/ml sonicated salmon sperm DNA, and 10-fold 
excess of competitor DNA), and 200 \i\ is appUed to each well for 1 hour at 20^C. After 
this time, the wells are emptied and washed 18 times with PBS containing 50|liM ZnCk 
and 1% Tween and 2 times in PBS containing 50pM ZnC^. Retained phage are eluted in 

10 100 III O.IM triethylamine and neutralised witii an equal volume of IM Tris (pH 7.4). 
Logarithmic-phase E, coli JM109 (100 pJ) are infected with eluted phage (100 |li1), and 
used to prepare phage supematants for subsequent rounds of selection. After 4 rounds of 
selection, a *poor or *mini-population' of phage is obtained, which bind the specified 
target sequence. These pools of phage can be stored at -70^C for later use. Additionally, 

IS E, coli infected with these phage pools can be plated to obtain individual clones, which 
can be tested by ELISA for binding affinity and specificity to obtain the 'best' clone (see 
Example 9, QuaUty Control). 

20 Example 7: Complete Human Zinc Finger Module Library for Universal DNA 
Recognition. 

An complete, or nearly complete, library containing all zinc finger sequences which bind 
a particular target site can be constructed using zinc finger modules to select 2-finger (or 
25 3-fmger) units which bind any 6 bp (or 9 bp) recognition sequence. Two exemplary 
methods for construction of such a Ubrary are described. 

a. Oligonucleotide-Based Library Construction. 

30 All zinc finger modules may be syndiesised as a single stranded oligonucleotide, as 
described in Example 4. Zinc finger modules are made double stranded and TGEKP 
(SEQ ID NO:3) Unkers added by PGR with 5* and 3' primers specific for each individual 
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zinc finger module, to make cassettes. These cassettes can then be recombined, as 
described in Example 5, to make random or deliberate combinations of zinc finger 
modules comprising 2, 3, or more linked fingers. 

5 PCR-Based Library Construction. 

Zinc fingers proteins (especially of the Cys2His2 family) form the second most abundant 
family of proteins in the human genome. Furthermore, in nature, zinc finger modules are 
often linlced by the canonical linker peptide TGEKP (SEQ ID N0:3), which begins 

10 immediately after the second zinc-coordinating histidine residue. Therefore, the peptide 
sequence HTGEKP (SEQ ID NO:2131) is commonly found between natural zinc finger 
modules. Because of this consensus sequence, it has been possible to clone natural zinc 
finger modules from the human genome (Becker, K.G., Nagel, J.W., Canning, R.D., 
Biddison, W.E., Ozato, K. & Drew, P.D. (1995) Hunu Mol Genet 4: 685-691; Bray, P., 

15 Lichter, P., Thiesen, H.-J., Ward, D.C. & Dawid, LB. (1991) Proa Natl Acad. Sci, USA 
88: 9563-9567), and the Arabidopsis genome (Meissner, R. & Michael, AJ. (1997) Plant 
Mol Biol 33: 615-624), using redundant primers for PGR. See also Pellegrino et al. 
(1991) Proc, Natl Acad, Sci. USA 88:671-675. It is preferable to use genomic DNA or a 
genomic DNA (gDNA) library, rather than a cDNA library, because transcription factors, 

20 such as zinc finger proteins, are strongly regulated during the cell cycle, development and 
in response to extracellular signals. Hence, a cDNA library will probably not contain the 
majority of zinc finger proteins, and will be biased towards highly expressed proteins. 

A suitable protocol for the PCR-extraction of zinc finger modules &om human genomic 
25 DNA follows: 

Genomic DNA is purified directly fi-om human cells, or provided by a gDNA library. 
gDNA libraries are preferable as they are commercially available (for example firom 
Clontech, ATCC, Stratagene etc) and can be easily manipulated. PCR to extract zinc 
30 finger modules can be conducted directly on purified gDNA, or the gDNA library can be 
screened for zinc fingers containing the HTGEKP (SEQ ID NO:2131) motif before 
carrying out PGR. To screen the gDNA library, any method known to one of skill in the 
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art, e.g. colony hybridisation, can be used. Phage containing gDNA inserts are plated 
onto Escherichia coli XL-1 Blue bacterial lawns. At least 10^ phage plaques are 
transferred to replica filters and screened with, for example, a 27-mer ^^P-radiolabelled 
degenerate oligonucleotide, wliich anneals to the conserved linker region of zinc finger 
5 proteins and adjacent sequences. The sequence of a suitable degenerate probe (SEQ ID 
NO:2132), and the amino acid sequence (SEQ ID NO:2133) to which it corresponds is 
shown below. 

C^///g ^/c/g ca'^/t Ac'^/g Gg'^/g ga^/a aa^/a cc'^/t t^///t 

10 R/L I/T/M H T G E K P Y/F 



Hybridisation is performed, e,g., for 16 hours at 42-50 °C, following which filters are 
washed 3-5 times, to remove non-specifically bound probe, in 0,2x standard saline citrate 
(SSC)/0.1% SDS. Filters are then subjected to autoradiography or phosphorimaging to 
1 5 determine positive plaques. 



Positive plaques are picked into log-phase coli XL-1 Blue bacterial cultures and the 
phage are harvested for PGR. 1 \j1 phage supernatant is added to 49 jxl PGR pre-mix, 
containing the oligonucleotide primers TGEKPfor (SEQ ID NO:2134) and TGEKPrev 

20 (SEQ ID NO:2135) (shown below, annealing sequence in bold), and zmc finger modules 
are amplified by 30 cycles of PGR. TGEKPfor (SEQ ID NO:2134) and TGEKPrev (SEQ 
ID N0:2135) also contain AX>flI and jBcoRI restriction sites (underlined), respectively. 
PGR products are separated on 1.5% TAE-agarose gels and fragments of approximately 
120 bp (corresponding to 1 zinc finger module plus flanking sequences) are purified, as 

25 described in Example 4. Additionally, fragments of approximately 220 bp, corresponding 
to natural 2-finger imits, can also be collected and used. Such products can be digested 
withuYfeal and ^'coRI and cloned into a vector that has been digested so as to generate 
compatible ends, such. as, for example, pcDNA3.1(-) (Invitrogen) digested with EcoKl 
^SidiXbal,, Such a vector pool can then be used as a source for natural 1- or 2-2inc finger 

30 modules, from which to construct 2- or 3-zinc finger peptides for selections as described 
above. Zinc finger modules for cassette B can be amplified from such vectors using the 
universal primers TGEKPXma (SEQ ID NO:2136) and TGEKPAge (SEQ ID NO:2137), 



wo 02/099084 



PCT/US02/22272 



139 

which anneal to the conserved TGEKP (SEQ ID N0:3) linker regions and add restriction 
sites for the enzymes Xmal at the 5* terminus and Agel at the 3* terminus, respectively 
(restriction sites imderlined). Cassette C units can be amplified using the primer 
TGEKPXma (SEQ ID NO:2136) and TGEKPend (SEQ ID NO:2138), which adds a 3' 
TRQKDGGGS (SEQ ID NO:2139) sequence incorporating a^awHI site (underimed, see 
below). Two- and 3-finger constructs can then be constructed and screened as described 
in the Examples above. 



TGEKPfor: TTAGICTAGA^/gCA^/tAC^/gGG%GA^/aAA^/aCC (SEQ ID 
10 NO:2134) 

TGEKPrev: TACTGMI1C^/aGG^/tTT^/tTc\cC^/cGT^/aTG (SEQ ID 
NO:2135) 

TGEKPXma: TCTAGA^/gCA^/tCCCGGGGA^/aAA^/aCC (SEQ ID NO:2136) 

TGEKPAge: GAATTC^/ a GG^/tTT^/tT CACCGGT% TG (SEQ ID NO:2137) 

15 TGEKPend: AGTGTGGTGGAATTC^/aGGGGATCCGCCGCCGTC%TT 
^/tTG%CG^/cGT^/aTG (SEQ ID NO:2138) 



20 



Example 8. Microarray Analysis. 



Microarray analysis can also be used to determine the binding site specificity of 2- and 3- 
finger peptides. For example, a 3-zinc finger library, with finger 1 fixed as Zif268 finger 
one recognises the sequence 5'-XXX-XXX-GCG-3', where X is any specified nucleotide. 
Heaace, there are 4096 (=4^ unique binding sites for such a library. All 4096 of these 

25 sites can be airayed onto a single glass slide, allowing a specified 2-finger peptide to be 
screened against every possible binding site at once. A suitable protocol for such an 
experiment is described in Martha L. Bulyk, Xiaohua Huang, Yen Choo, & George 
M. Church {Proc. Natl Acad, Set USA: Vol. 98, No. 13, 7158-7163, June 19, 2001) 
which is incorporated, by reference, in its entirety. See also co-owned WO 01/25417, the 

30 disclosure of which is hereby incorporated by reference in its entirety. 
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The amount of binding to each target sequence can be visualised and quantified using 
simple fluorescence measurements. For example, the zdnc finger peptide can be 
expressed in vitrOy or on the surface of phage. Isolated zinc finger peptides may contain 
an epitope tag for labelling purposes, whereas bound phage can be detected using a 
5 primary antibody against a phage coat protein, such as gVin. A secondary antibody, such 
as one conjugated to R-phycoerythrin may be used to provide a visible signal when a 
suitable substrate is applied. 

10 Example 9. Quality Control. 

Particular 2- or 3 -finger peptides can be screened to detemiine their specificity or affinity, 
as desired. 
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a. Phage ELISA Assay 

Phage supematants from Round 4 of selection (Example 6, supra) are used to infect 
E. coli JM109 bacteria, and grown to prepare fresh supematants for zinc finger phage 
5 ELISA, using standard procedures as described previously (Choo, Y. & Klug, A. (1994) 
Proc, Natl Acad. ScL USA 91,11 163-1 1 167; Choo, Y. & Klug, A. (1994) Proa Natl. 
AcadSci. USA 91, 11168-11172). Briefly, 5'-biotinylated, positionally randomised 
oligonucleotide libraries, containing Zif268 binding site variants, are synthesised by 
annealing complimentary oligonucleotides as described supra. DNA libraries are added 

10 to streptavidin-coated ELISA wells (Boehringer-Maimheim) in PBS containing 50yM 
ZnCl2 (PBS/Zn). Phage solution (overnight bacterial culture supernatant diluted 1:10 in 
PBS/Zn containing 2% Marvel, 1% Tween and 20ng/ml sonicated salmon sperm DNA) is 
applied to each well (50fil/well). Binding is allowed to proceed for one hour at 20^C. 
Unbound phage are removed by washing 7 times with PBS/Zn containing 1% Tween, 

1 5 then 3 times with PBS/Zn. Bound phage are detected by ELISA using horseradish 

peroxidase-conjugated anti-M13 IgG (Pharmacia Biotech) and the colourimetric signal is 
quantitated using SOFTMAX 2.32 (Molecular Devices). 

For rapid validation, the entire population of phage from Round 4 selection can be . 
20 assayed in two ELISA wells: one containing the target DNA binding site, and one 

containing a control DNA binding site with between 1 and 5 base changes from the target 
sequence. A selection is deemed to be successftil if the ELISA signal (representing DNA 
binding) is higher in the target well than in the control well. 

25 The higher the signal measured above, the greater the population of specific binding 

clones. However, individual low values for such a procedure do not necessarily indicate 
a failure of the selection, as there may be individual high affinity / specificity clones 
within the round 4 phage population that may be masked by other non-specific clones. 
Nevertheless, this assay provides a quick profile of the overall quality of selection. 



30 
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For a more detailed validation, individual phage clones are recovered from Round 4 by 
plating out infected bacterial colonies on agar. Fresh phage suponatants are prepared 
from these colonies and assayed by ELISA, as described above. 

5 Finally, the coding sequence of individual zinc finger clones can be amplified by PCR 
using external primers complementary to phage sequence, and the PCR products are then 
sequenced to determine the amino acid sequence of the selected zdnc fingers. 

•» 

As an alternative, individual 3-finger peptides can be analysed by gel-shift assays or by 
10 microarray screening, as described infra. See also WO 00/41566, WO 00/42219 and 
WO 01/25417. 

b. Gel-Shift Assay 

32 

Peptides are assayed using P end-labelled synthetic ohgonucleotide duplexes containing 
15 the appropriate binding site sequences. 

DNA binding reactions contain the appropriate zinc-finger peptide, binding site and 1 p.g 
competitor DNA (eg., poly dl-dC or salmon sperm DNA) in a total volume of 10 |il, 
which contains: 20 mM Bis-tris propane (pH 7.0), 100 mM NaCl, 5 mM MgClj, 50 yM 
ZnClj, 5 mM DTT, 0.1 mg/ml BSA, 0.1% Nonidet P40. Incubations are performed at 
20 room temperature for 1 hour. 

To determine the concentration of zinc finger peptide produced in the in vitro expression 
system, crude protein samples are used in gel-shift assays against a dilution series of the 
appropriate binding site. Binding site concentration is always well above the Kd of the 
peptide, but ranged Scorn a higher concentration than the peptide (80 mM), at which all 
25 available peptide binds DNA, to a lower concentration (3-5 mM), at which all DNA is 
bound. Controls are carried out to ensure that binding sites are not shifted (z.a, bound) in 
the absence of zinc finger peptide. The reaction mixtures are then separated on a 7% 
native polyacrylamide gel. Radioactive signals are quantitated by Phosphorlmager 
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analysis to determine the amount of shifted binding site, and hence, the concentration of 
active zinc finger peptide. 

Dissociation constants (Kd) are determined in parallel to the calculation of active peptide 
concentration. For determination of Kj, serial 3, 4 or 5-fold dilutions of crude peptide are 
5 made and incubated with radiolabelled binding site (10 pM - 1 0 nM depending on the 
peptide), as above. Samples are run on 7% native polyacrylamide gels and the 
radioactive signals quantitated by Phosphorlmager analysis. The data is then analysed 
according to Hnear transformation of the binding equation and plotted in CA-Cricket 
Graph III (Computer Associates Inc. NY) to generate the apparent dissociation constants. 
1 0 The Kd values reported are the average of at least two separate determinations. 

c. Mcroarray Assay 

Selected zinc finger domains can also be assayed for binding site specificity using the 
1 5 microairay analysis outUned in Example 8. 

All publications mentioned in the above specification are herein incorporated by 
reference. Various modifications and variations of the described methods and system of 
the invention will be apparent to those skilled in the art without departing from the scope 

20 and spirit of the invoation. Although the invention has been described in connection with 
specific preferred embodiments, it should be imderstood that the invention as claimed 
should not be unduly limited to such specific embodiments. Indeed, various 
modifications of the described modes for carrying out the invention which are apparent to 
those skilled in molecular biology or related fields are intended to be within the scope of 

25 the following claims. 
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CLAIMS 

1. A composite binding polypeptide comprising a first natural binding domain 
derived from a JBLrst natural binding polypeptide, and a second natural binding domain 
derived from a second natural binding polypeptide, wherein said first and second natural 
binding polypeptides may be the same or different; which polypeptide binds to a target, 
said target differing from the natural target of the both the first and the second binding 
polypeptides. 

2. A composite polypeptide according to claim 1, wherein said first and second 
natural binding polypeptides are different polypeptides. 

3. A composite polypeptide according to claim 1 or claim 2, comprising three or 
more natural binding domains. 

4. A composite polypeptide according to any preceding claim, wherein the binding 
domains are nucleic acid binding domains. 

5. A composite polypeptide according to claim 4, which is a nucleic acid binding 
polypeptide. 

6. A composite polypeptide according to claim 4 or claim 5 which is a zinc finger 
polypeptide, and the natural binding domains are zinc finger domains. 

7. A composite polypeptide according to claim 6, which comprises a Cys2-His2 zinc 
finger binding domain. 

8. A composite polypeptide according to claim 6 or claim 7, which comprises a 
Cys3-His zinc finger binding domain. 

9. A composite polypeptide according to any preceding claim, which comprises 6 or 
more natural binding domains. 
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10. A composite polypq)tide according to claim 9, wherein 6 natural binding domains 
are arranged in a 3x2 conformation, separated by linker sequences. 

11. A chimeric polypeptide comprising: 

(a) a binding polypeptide according to any preceding claim, and 

(b) a biological effector domain. 

11. A library of natural binding domains. 

12. A library according to claim 11, comprising a plurality of natural binding domains 
from which a polypeptide according to any one of claims 1 to 10 can be assembled. 

13. A library of natural zinc finger nucleic acid binding domains, wherein said zinc 
finger domains comprise a linker attached thereto. 

1 4. A library according to claim 13, wherein the linker comprises the sequence 
TGEKP. 

15. A method for selecting a binding polypeptide capable of binding to a target site, 
comprising: 

(a) providing a library of natural binding domains; 

(b) assembling two or more of said domains to form a composite polypeptide; 

(c) screening said composite polypeptide against the target site in order to 
determine its ability to bind the target site. 

16. A method according to claim 15, wherein the natural binding domains are zinc 
finger binding domains. 

17. A method according to claim 15 or claim 16, wherein two or more composite 
polypeptides comprising two or more domains which are selected for binding to two or 
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more target sites are combined to provide a composite polypeptide which binds to an 
aggregate binding site comprising the two or more target binding sites. 

18. A method for designing a composite binding polypeptide, comprising: 

(a) providing information defining a target site; 

(b) selecting, from a database of natural binding domains, sequences of binding 
domains which are predicted to bind to the target site by the application of one or more 
rules which define target binding interactions for the binding domains; and 

(c) displaying the sequences of die binding domains, separated by Unker 
sequences, and optionally assembling the binding polypeptide from a library of said 
domains. 

19. A method according to claim 18, wherein the binding domains are zinc finger ' 
domains. 

20. A method according to claim 19, wherein the 2dnc fingers are considered to bind 
to a nucleic acid triplet and domains are selected according to one or more of the 
following rules: 

(a) if the 5' base in the triplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position ++2 is Asp; 

(b) if the 5' base in the triplet is A, then position +6 in the a-helix is Gta and ++2 
is not Asp; 

(c) if the 5' base in the triplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

(d) if the 5' base in the triplet is C, then position +6 in the a-helix may be any 
amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if the central base in the triplet is G, then position +3 in the a-helix is His; 

(f) if the central base in the triplet is A, then position +3 in the a-hehx is Asn; 

(g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser 
or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 
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(h) if the central base in the triplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu,Leu, ThrorVal; 

(i) if the 3' base in the triplet is G, then position -1 in the a-helix is Arg; 
(j) if the 3' base in the triplet is A, then position -1 in the a-helix is Gin; 

(k) if the 3 ' base in the triplet is T, then position -1 in the a-helix is Asn or Ghi; 
(1) if the 3' base in the triplet is C, tlien position -1 in the a-helix is Asp. 

21 . A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid quadruplet and domains are selected according to one or more of the 
following rules: 

(a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Glu, Asn or 

Val; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val 

or Lys; 

(d) if base 4 in the quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, 
Ala, Glu or Asn; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadnqjlet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is His or Thr; 

(1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp or His; 

(m) if base 1 in the quadruplet is G, then position +2 is Glu; 

(n) if base 1 in the quadruplet is A, then position +2 Arg or Gin; 

(o) if base 1 in the quadruplet is C, then position +2 is Asn, Ghi, Arg, His or Lys; 

(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 
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22. A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid quadruplet and domains are selected according to one or more of the 
following rules: 

(a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position -H-2 is Asp; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Gin and ++2 
is not Asp; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr 
and position -H-2 is Asp; 

(d) if base 4 in the quadruplet is C, then position +6 in the a-helix may be any 
amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in tlie quadruplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is Asn or Gin; 
(1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp; 
(m) if base 1 in the quadruplet is G, then position +2 is Asp; 
(n) if base 1 m the q\iadruplet is A, then position +2 is not Asp; 
(o) if base 1 in the quadruplet is C, then position +2 is not Asp; 
(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 

23. The method of any of claims 1 8-22, further comprising the step of synthesizing a 
poljoiucleotide encoding the binding polj^^tide. 
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24. A computer-implemented method for designing a zinc finger polypeptide, 
comprising the steps of: 

(a) providing a system comprising at least storage means for storing data relating 
to a library of zinc fingers; storage means for storing a rule table; means for inputting 
target nucleic acid sequence data; processing means for generating a result; and means for 
outputting the result; 

(b) inputting sequence data for a target nucleic acid molecule; 

(c) defining a first target zinc finger binding site in said nucleic acid molecule; 

(d) interrogating the zinc finger library and rule table storage means, comparing 
zinc fingers to the target zinc finger binding site according to the rule table and selecting 
zinc fijiger data identifying a zinc finger capable of binding to said target site; 

(e) defining at least one further target zinc finger binding site and repeating step 
(d); and 

(f) outputting the selected zinc finger data. 

25. A method according to claim 24, fiirther comprising sending instructions to an 
automated chemical synthesis system to assemble a zinc finger polypeptide as defined by 
the zinc finger data obtained in (f). 

26. A method according to claim 25, wherein the zinc finger polypeptide is tested for 
binding to the target site, and data fi-om said testing is used to select, firom a plurality of 
candidates, a zinc finger polypeptide capable of binding to the target site. 

27. A method according to any one of claims 24 to 26, wherein two or more zinc 
finger polypeptides are combined to form a zinc finger polypeptide capable of binding to 
an aggregate binding site comprising two or more target sites. 

27. A method according to claim 24, wherein the rale table con^rises rules as set 
forth in any one of claims 21 to 23. 
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